[GH-ISSUE #213] Config: Add a DELAY_AFTER_LOAD option to delay snapshot further after onload event #1655

Closed
opened 2026-03-01 17:52:36 +03:00 by kerem · 6 comments
Owner

Originally created by @fr0der1c on GitHub (Apr 10, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/213

Type

  • General Question or Disussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

image
The PDF and screenshot are generated too early. As you can see. I can only see a meaningless loading page in PDF and screenshot instead of actual content.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I cant live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
Originally created by @fr0der1c on GitHub (Apr 10, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/213 ## Type - [x] General Question or Disussion - [ ] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves ![image](https://user-images.githubusercontent.com/16500161/55847831-78f6f180-5b7d-11e9-8739-c4ccf052f455.png) The PDF and screenshot are generated too early. As you can see. I can only see a meaningless loading page in PDF and screenshot instead of actual content. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I cant live without it - [x] It's important to add it in the near-mid term future - [ ] It would be nice to have eventually
Author
Owner

@pirate commented on GitHub (Apr 10, 2019):

They're generated when the page onload event fires. If you're seeing pages show up empty in screenshot, it means those sites are doing some weird post-load AJAX to load their content (which is unfortunately more and more common lately).

We can try adding a config var like DELAY_AFTER_LOAD=5 do add a 5s delay after page onload, but I can't promise it'll be finished within the next month.

<!-- gh-comment-id:481844418 --> @pirate commented on GitHub (Apr 10, 2019): They're generated when the page `onload` event fires. If you're seeing pages show up empty in screenshot, it means those sites are doing some weird post-load AJAX to load their content (which is unfortunately more and more common lately). We can try adding a config var like `DELAY_AFTER_LOAD=5` do add a 5s delay after page onload, but I can't promise it'll be finished within the next month.
Author
Owner

@Joonas12334 commented on GitHub (Apr 6, 2024):

Any updates? The loading is still broken on some pages because of this.

<!-- gh-comment-id:2041163777 --> @Joonas12334 commented on GitHub (Apr 6, 2024): Any updates? The loading is still broken on some pages because of this.
Author
Owner

@michael-haechler commented on GitHub (Mar 7, 2025):

+1 on this. Or are people solving this issue differently?

<!-- gh-comment-id:2706856048 --> @michael-haechler commented on GitHub (Mar 7, 2025): +1 on this. Or are people solving this issue differently?
Author
Owner

@pirate commented on GitHub (Mar 8, 2025):

Unfortunately there is no easy chrome CLI flag to add a delay before the screenshot, DOM, or PDF are rendered. Until we complete the switch to puppeteer away from the simple CLI calling convention, I don't have a great solution yet. If you can share any of the domains where you're experiencing it particularly bad I can maybe give some workaround recommendations.

<!-- gh-comment-id:2708042129 --> @pirate commented on GitHub (Mar 8, 2025): Unfortunately there is no easy chrome CLI flag to add a delay before the screenshot, DOM, or PDF are rendered. Until we complete the switch to puppeteer away from the simple CLI calling convention, I don't have a great solution yet. If you can share any of the domains where you're experiencing it particularly bad I can maybe give some workaround recommendations.
Author
Owner

@cedricdeboom commented on GitHub (May 8, 2025):

www.airbnb.com is particularly bad...

<!-- gh-comment-id:2862840174 --> @cedricdeboom commented on GitHub (May 8, 2025): www.airbnb.com is particularly bad...
Author
Owner

@pirate commented on GitHub (Dec 30, 2025):

ok I've completed the switch to puppeteer on dev, you can now configure delays using a few options:

ArchiveBox.conf:

CHROME_PAGELOAD_TIMEOUT=60    # max timeout for page loading in seconds
CHROME_WAIT_FOR=networkidle2  # domcontentloaded | load | networkidle0 | networkidle2
CHROME_DELAY_AFTER_LOAD=5     # wait extra seconds after load before archiving
<!-- gh-comment-id:3700530021 --> @pirate commented on GitHub (Dec 30, 2025): ok I've completed the switch to `puppeteer` on `dev`, you can now configure delays using a few options: `ArchiveBox.conf`: ```bash CHROME_PAGELOAD_TIMEOUT=60 # max timeout for page loading in seconds CHROME_WAIT_FOR=networkidle2 # domcontentloaded | load | networkidle0 | networkidle2 CHROME_DELAY_AFTER_LOAD=5 # wait extra seconds after load before archiving ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1655
No description provided.