mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #883] Bug: Empty image spaces where images are supposed to be #3567
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3567
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Unrepentant-Atheist on GitHub (Oct 25, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/883
Describe the bug
Empty Image Spaces, where Images are supposed to be. Singlefile, Wget both show empty images.
Steps to reproduce
Go to https://mariushosting.com/ and archive any of the posts
Screenshots or log output
https://ibb.co/QJHGWzC
ArchiveBox version
latest
@pirate commented on GitHub (Oct 26, 2021):
try increasing the download timeout in case it's slow:
archivebox config --set TIMEOUT=180.@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):
And in a docker-compose.yml I'd write it how...? Can't do it in the console because..
@pirate commented on GitHub (Oct 27, 2021):
docker-compoes run archivebox config --set TIMEOUT=240or just change your TIMEOUT line in docker-compose.yml:@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):
The docker-compose.yml
- TIMEOUT=240is something I added after your comment, but I'm not seeing any effect. Can't rundocker compose run archivebox config --set TIMEOUT=240because I run Portainer on Server A, but the archhivebox container is on Server B, which has portainer_agent running. I deploy the ArchiveBox as stack on Server B through Server A-Portainer.@pirate commented on GitHub (Oct 27, 2021):
If there's no change then it's probably not a timeout issue, the images are probably just not archivable with those methods for that particular site.
@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):
Well.....not true... when I do
wget --mirror --html-extension --no-parent --convert-links --page-requisites "url"I get everything@pirate commented on GitHub (Oct 27, 2021):
Can you post the docker logs from the archiving / the output of running the wget command that archivebox runs (you can find it in the logs).
@Unrepentant-Atheist commented on GitHub (Oct 28, 2021):
https://www.toptal.com/developers/hastebin/mayutesoru.txt
@Unrepentant-Atheist commented on GitHub (Nov 1, 2021):
I went to the wiki, and found this: https://github.com/gildas-lormeau/SingleFile/ , I tried this on all the archived URLs that had missing images, and every single file made with https://github.com/gildas-lormeau/SingleFile/ worked and had all the images. Maybe implement this into ArchiveBox!
@iwconfig commented on GitHub (Dec 11, 2021):
I can confirm I have this issue as well.
Isn't
SingleFilealready implemented?EDIT: Just noticed you mentioned SingleFile in the issue description. What is the difference between the ArchiveBox SingleFile and https://github.com/gildas-lormeau/SingleFile/?
@pirate commented on GitHub (Dec 11, 2021):
ArchiveBox Singlefile is gildas-lormeau/SingleFile.
@iwconfig commented on GitHub (Apr 5, 2022):
Could it not be due to the fact that some images doesn't load until you scroll them into view? I've noticed that on when saving https://www.svt.se/ using obelisk that only the first few images are saved, which makes sense when inspecting the network activity while scrolling the page in the browser.
The strange thing is, when I use SingleFile in my Firefox browser, it does GET request for every image in the page (svt.se), without scrolling. It even tells you it's grabbing "deferred images". Same result in my Chromium browser.
Why doesn't SingleFile do this with the headless Chrome(ium?) instance in ArchiveBox as well? Would autoscroll fix the issue? That doesn't explain why it works in headful browsers but not in headless, though.
@pirate commented on GitHub (Apr 12, 2022):
Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.
@GlassedSilver commented on GitHub (May 10, 2022):
Would it be feasible to update SingleFile from the source periodically automatically?
Same for ytdl/yt-dlp.
The releases taking their time is okay, but I'd be good to have instances pull independently of releases the tools needed.
The web is progressing faster and faster and to keep up with dependencies of these sorts is crucial in getting consistently good mirrors.
Since I run archivebox in a docker image it'd be really cool if this functionality could be baked in. :)
Another site to test this with: https://xemu.app/
@melyux commented on GitHub (Jul 15, 2023):
Please bump Singlefile, the current version is ancient. So ancient that the example
SINGLEFILE_ARGSgiven in the config (--load-deferred-images-dispatch-scroll-event=true) doesn't event work on the version bundled because it's too old@pirate commented on GitHub (Aug 13, 2023):
Singlefile should already be bumped in the latest dev branch, please try that version: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch
@pirate commented on GitHub (Nov 9, 2023):
Singlefile and Chrome are both on the most recent versions in ArchiveBox 0.7.1, so this should be resolved. Please comment back here if you're still having issues and I'll re-open the ticket.