mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #80] Autoscroll before before archiving and take full-height screenshots #3076
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3076
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pirate on GitHub (Jun 19, 2018).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/80
I've sumbitted a Chromium bug tracker feature request for adding a
--full-pageflag: https://bugs.chromium.org/p/chromium/issues/detail?id=854013Hopefully it's merged, allowing us to screenshot the full height of pages, instead of limiting them to the config settings defined by
DIMENSIONS.@pirate commented on GitHub (Mar 15, 2019):
This will be easy with user scripts the moment pyppeteer is merged in #177. Or if we switch to playwright it's also easy using playwright's
--full-pageflag. https://github.com/ArchiveBox/ArchiveBox/issues/51@mtvu commented on GitHub (Jun 10, 2021):
The code provided in this playwright issue solves the full-page screenshot problem for me
https://github.com/microsoft/playwright/issues/620
Here is the code I use to take a full page screenshot with playwright
@timdonovanuk commented on GitHub (Jun 11, 2021):
Is this feature natively available now or only via hacking in user scripts?
@pirate commented on GitHub (Jun 11, 2021):
Not available natively yet, it's blocked on https://github.com/ArchiveBox/ArchiveBox/issues/51
@timdonovanuk commented on GitHub (Jun 11, 2021):
Ah fair enough, thanks! Seems like #51 encapsulates a whole ton of effort to make this happen, so thanks and good luck!
@DeoLeung commented on GitHub (Mar 7, 2025):
will be great to have the ability to take full height screenshots! any update on this after 4 years?
@pirate commented on GitHub (Mar 10, 2025):
My conclusion after a lot of work on this issue is that full-page screenshots up to ~8000px maximum height are ok, but many many pages are longer than that, and most common image formats actually don't support images that big. Even the formats that do (png) cause most image viewers to crash when you try to open them. You need to mess with Chrome's GPU memory settings to even get it to take more than 16,000px in one image, let alone the 90,000px+ that some long comment thread pages have.
Multiple screenshots are the better solution. My solution so far is one 4:3 screenshot at the top of the page, and then numbered 16:10 screenshots for like ~15 full-height scrolls down the page. Also works great for feeding it to vision and OCR models for analysis.
I built this ^ more advanced puppeteer based screenshot approach for a paying client last year, and it's still in active development. It's all in TS and ArchiveBox is all Python, so it takes time to bridge that gap, refactor, open source it, document it, package it, ship it, etc. for the public.
@pirate commented on GitHub (Jan 8, 2026):
devnow has aninfiniscrollplugin out-of-the-box which scrolls the page up to N times and expands comments and detail blocks in the process. it doesnt implement the multiple screenshots approach yet but it's a step in the right direction.