[GH-ISSUE #1619] Feature Request: Integration of ReplayWeb.Page for previewing WARC/WACZ files #971

Open
opened 2026-03-01 14:47:38 +03:00 by kerem · 1 comment
Owner

Originally created by @nopper on GitHub (Dec 12, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1619

Originally assigned to: @pirate on GitHub.

What type of suggestion are you making?

New extractor / type of content to save

What is the problem that your feature request solves?

It would be great if ArchiveBox provided a simple HTML page for viewing the warc/wacz files that are being generated.

What is your proposed solution?

One suggestion would be to incorporate replayweb.page as a potential web component for rendering these files. This would be particularly useful for sites that are not adequately captured by single-file formats due to the presence of dynamic elements or scripts.

What hacks or alternative solutions have you tried to solve the problem?

Share the entire output of the archivebox version command for the current verison you are using.

latest

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
  • I'm willing to work on a PR to develop this myself
  • I have donated money to go towards fixing this issue

Mini Survey

  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
  • I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
Originally created by @nopper on GitHub (Dec 12, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1619 Originally assigned to: @pirate on GitHub. ### What type of suggestion are you making? New extractor / type of content to save ### What is the problem that your feature request solves? It would be great if ArchiveBox provided a simple HTML page for viewing the warc/wacz files that are being generated. ### What is your proposed solution? One suggestion would be to incorporate [replayweb.page](https://github.com/webrecorder/replayweb.page?tab=readme-ov-file) as a potential web component for rendering these files. This would be particularly useful for sites that are not adequately captured by single-file formats due to the presence of dynamic elements or scripts. ### What hacks or alternative solutions have you tried to solve the problem? - ### Share the entire output of the `archivebox version` command for the current verison you are using. ```shell latest ``` ### How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [x] It's important to add it in the near-mid term future - [x] It would be nice to have eventually - [ ] I'm willing to [work on a PR](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) to develop this myself - [ ] I have [donated money](https://github.com/ArchiveBox/ArchiveBox/wiki/Donations) to go towards fixing this issue ### Mini Survey - [ ] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up - [ ] I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
Author
Owner

@pirate commented on GitHub (Dec 12, 2024):

Already have this on my roadmap 😁

have even implemented it a couple times: https://github.com/ArchiveBox/ArchiveBox/pull/1327/files#diff-08041ea7039132ec35c8a6d986cb5f3808decf38291f48f199a8d76af3d1cba5

the blocker is the warcs produced by plain wget are not standards compliant and look terrible, so we need to switch to wget-lua or browsertrix to actually capture them, and that's a much bigger ordeal

<!-- gh-comment-id:2540194017 --> @pirate commented on GitHub (Dec 12, 2024): Already have this on my roadmap 😁 have even implemented it a couple times: https://github.com/ArchiveBox/ArchiveBox/pull/1327/files#diff-08041ea7039132ec35c8a6d986cb5f3808decf38291f48f199a8d76af3d1cba5 the blocker is the warcs produced by plain wget are not standards compliant and look terrible, so we need to switch to wget-lua or browsertrix to actually capture them, and that's a much bigger ordeal
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#971
No description provided.