[GH-ISSUE #1283] Question: How to save a single page from eBay completely? #3808

Closed
opened 2026-03-15 00:32:04 +03:00 by kerem · 4 comments
Owner

Originally created by @Binarus on GitHub (Dec 7, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1283

Dear all,

since quite a while, I am trying to find method to save a single page from the eBay website completely. Completely here means that I would like to have every UI element on that archive page working as on the original page. The purpose behind is to archive my own offers, and perhaps a few other interesting offers.

However, this does not seem to be possible. I already have tried the tools from WebRecorder (WebArchive.page and WebReplay.page, the Chrome extension as well as the standalone version). I have also saved such pages to the MHTML format and to the WARC format, using various tools and extensions. However, when loading those saved files into the browser or replaying them, respectively, essential parts were missing.

A few hours ago, I came across ArchiveBox and had great hope that it would let me do what I want :-). However, after having solved a few problems (this was my first experience with docker), it turned out that there was exactly the same problem as with the WebRecorder tools:

After having archived an eBay offer, some UI elements on the archived page do not work as on the original page. For example, none of the UI functions regarding the images works any more: If there are multiple images, only the first is shown and the others can't be selected. If I move the mouse cursor over the image, the magnifier does not work. If I click on a image, the image carousel does not open. And so on ...

Now I have two questions:

  1. Could anybody please explain the underlying problem?

I could imagine that JavaScript files are missing in the archive, and I understand that the technical problems are not easy to solve, given that a page could contain scripts that load other scripts, and so on. However, browsers are processing these pages correctly, and since a headless browser is used (among others) to create the archive, I would have expected that archived pages work exactly as the originals.

  1. Is there something I can do to improve the situation (besides contributing to the project, which is currently not possible for me)?

Thank you very much in advance,

Binarus

Originally created by @Binarus on GitHub (Dec 7, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1283 Dear all, since quite a while, I am trying to find method to save a single page from the eBay website completely. Completely here means that I would like to have every UI element on that archive page working as on the original page. The purpose behind is to archive my own offers, and perhaps a few other interesting offers. However, this does not seem to be possible. I already have tried the tools from WebRecorder (WebArchive.page and WebReplay.page, the Chrome extension as well as the standalone version). I have also saved such pages to the MHTML format and to the WARC format, using various tools and extensions. However, when loading those saved files into the browser or replaying them, respectively, essential parts were missing. A few hours ago, I came across ArchiveBox and had great hope that it would let me do what I want :-). However, after having solved a few problems (this was my first experience with docker), it turned out that there was exactly the same problem as with the WebRecorder tools: After having archived an eBay offer, some UI elements on the archived page do not work as on the original page. For example, none of the UI functions regarding the images works any more: If there are multiple images, only the first is shown and the others can't be selected. If I move the mouse cursor over the image, the magnifier does not work. If I click on a image, the image carousel does not open. And so on ... Now I have two questions: 1. Could anybody please explain the underlying problem? I could imagine that JavaScript files are missing in the archive, and I understand that the technical problems are not easy to solve, given that a page could contain scripts that load other scripts, and so on. However, browsers are processing these pages correctly, and since a headless browser is used (among others) to create the archive, I would have expected that archived pages work exactly as the originals. 2. Is there something I can do to improve the situation (besides contributing to the project, which is currently not possible for me)? Thank you very much in advance, Binarus
kerem closed this issue 2026-03-15 00:32:09 +03:00
Author
Owner

@pirate commented on GitHub (Dec 8, 2023):

I'm afraid that if the Webrecorder tools can't do it, ArchiveBox definitely won't be able to. Their archiving tech is way better for interactive JS-heavy pages than mine. I'm guessing you need to record the pages with a specialized puppeteer/user script to be able to get all the elements you need, likely because the JS is making dynamic requests to a server upon user interactions that are not being captured by your recorder.

You might need to hire a dev to build that (I'm available for consulting in January) or look into the puppeteer recorder extension + browsertrix cloud by the Webrecorder team.

The unfortunate reality of archiving these days is that the big web companies deliberately obfuscate their web apps to make adblocking/user scripts/etc other unwanted tampering harder, which makes archiving harder as well and leads to situations like this one.

<!-- gh-comment-id:1846407499 --> @pirate commented on GitHub (Dec 8, 2023): I'm afraid that if the Webrecorder tools can't do it, ArchiveBox definitely won't be able to. Their archiving tech is way better for interactive JS-heavy pages than mine. I'm guessing you need to record the pages with a specialized puppeteer/user script to be able to get all the elements you need, likely because the JS is making dynamic requests to a server upon user interactions that are not being captured by your recorder. You might need to hire a dev to build that (I'm available for consulting in January) or look into the puppeteer recorder extension + browsertrix cloud by the Webrecorder team. The unfortunate reality of archiving these days is that the big web companies deliberately obfuscate their web apps to make adblocking/user scripts/etc other unwanted tampering harder, which makes archiving harder as well and leads to situations like this one.
Author
Owner

@Binarus commented on GitHub (Dec 8, 2023):

Thank you very much for your explanations!

I'll then just go without archiving those offers. If it was for my company, I surely would consider hiring you; thanks for the hint! However, the desire to archive eBay pages is purely private, which unfortunately means that hiring somebody to achieve it is not an option in this case :-(

I guess should now close this issue. Thanks again!

<!-- gh-comment-id:1846600691 --> @Binarus commented on GitHub (Dec 8, 2023): Thank you very much for your explanations! I'll then just go without archiving those offers. If it was for my company, I surely would consider hiring you; thanks for the hint! However, the desire to archive eBay pages is purely private, which unfortunately means that hiring somebody to achieve it is not an option in this case :-( I guess should now close this issue. Thanks again!
Author
Owner

@pirate commented on GitHub (Dec 8, 2023):

Give browsertrix cloud a try as a last resort! It's open source and has custom user script support. Might take some work to set up but it's great software.

<!-- gh-comment-id:1847743465 --> @pirate commented on GitHub (Dec 8, 2023): Give browsertrix cloud a try as a last resort! It's open source and has custom user script support. Might take some work to set up but it's great software.
Author
Owner

@Binarus commented on GitHub (Dec 18, 2023):

Thanks for the tip!

However, to be honest, setting up browsertrix seems too much effort. During the holiday season, I eventually find some time to look into puppeteer / user scripts. Currently, I have no clue how it works, but I am looking forward to learn it.

Best regards,

Binarus

<!-- gh-comment-id:1859640578 --> @Binarus commented on GitHub (Dec 18, 2023): Thanks for the tip! However, to be honest, setting up browsertrix seems too much effort. During the holiday season, I eventually find some time to look into puppeteer / user scripts. Currently, I have no clue how it works, but I am looking forward to learn it. Best regards, Binarus
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3808
No description provided.