[GH-ISSUE #883] Bug: Empty image spaces where images are supposed to be #2057

Closed
opened 2026-03-01 17:56:07 +03:00 by kerem · 17 comments
Owner

Originally created by @Unrepentant-Atheist on GitHub (Oct 25, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/883

Describe the bug

Empty Image Spaces, where Images are supposed to be. Singlefile, Wget both show empty images.

Steps to reproduce

Go to https://mariushosting.com/ and archive any of the posts

Screenshots or log output

https://ibb.co/QJHGWzC

ArchiveBox version

latest

Originally created by @Unrepentant-Atheist on GitHub (Oct 25, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/883 #### Describe the bug Empty Image Spaces, where Images are supposed to be. Singlefile, Wget both show empty images. #### Steps to reproduce Go to https://mariushosting.com/ and archive any of the posts #### Screenshots or log output https://ibb.co/QJHGWzC #### ArchiveBox version latest
kerem closed this issue 2026-03-01 17:56:07 +03:00
Author
Owner

@pirate commented on GitHub (Oct 26, 2021):

try increasing the download timeout in case it's slow: archivebox config --set TIMEOUT=180.

<!-- gh-comment-id:951591304 --> @pirate commented on GitHub (Oct 26, 2021): try increasing the download timeout in case it's slow: `archivebox config --set TIMEOUT=180`.
Author
Owner

@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):

And in a docker-compose.yml I'd write it how...? Can't do it in the console because..

[!] ArchiveBox should never be run as root!
    For more information, see the security overview documentation:
        https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root
---
version: '3'
services:
    archivebox:
        image: archivebox/archivebox
        container_name: archivebox
        command: server --quick-init 0.0.0.0:8000
        ports:
            - 8571:8000
        environment:
            - ALLOWED_HOSTS=*
            - MEDIA_MAX_SIZE=750m
            - TIMEOUT=240
        labels:
            - deunhealth.restart.on.unhealthy=true
        volumes:
            - /home/user/docker-data/archivebox:/data
        restart: always         
networks:
  default:
    name: dockernet
    external: true
<!-- gh-comment-id:952762905 --> @Unrepentant-Atheist commented on GitHub (Oct 27, 2021): And in a docker-compose.yml I'd write it how...? Can't do it in the console because.. ``` [!] ArchiveBox should never be run as root! For more information, see the security overview documentation: https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root ``` ``` --- version: '3' services: archivebox: image: archivebox/archivebox container_name: archivebox command: server --quick-init 0.0.0.0:8000 ports: - 8571:8000 environment: - ALLOWED_HOSTS=* - MEDIA_MAX_SIZE=750m - TIMEOUT=240 labels: - deunhealth.restart.on.unhealthy=true volumes: - /home/user/docker-data/archivebox:/data restart: always networks: default: name: dockernet external: true ```
Author
Owner

@pirate commented on GitHub (Oct 27, 2021):

docker-compoes run archivebox config --set TIMEOUT=240 or just change your TIMEOUT line in docker-compose.yml:

...

        environment:
            - TIMEOUT=240
<!-- gh-comment-id:953078027 --> @pirate commented on GitHub (Oct 27, 2021): `docker-compoes run archivebox config --set TIMEOUT=240` or just change your TIMEOUT line in docker-compose.yml: ```yaml ... environment: - TIMEOUT=240 ```
Author
Owner

@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):

The docker-compose.yml - TIMEOUT=240 is something I added after your comment, but I'm not seeing any effect. Can't run docker compose run archivebox config --set TIMEOUT=240 because I run Portainer on Server A, but the archhivebox container is on Server B, which has portainer_agent running. I deploy the ArchiveBox as stack on Server B through Server A-Portainer.

<!-- gh-comment-id:953185415 --> @Unrepentant-Atheist commented on GitHub (Oct 27, 2021): The docker-compose.yml `- TIMEOUT=240` is something I added after your comment, but I'm not seeing any effect. Can't run `docker compose run archivebox config --set TIMEOUT=240` because I run Portainer on Server A, but the archhivebox container is on Server B, which has portainer_agent running. I deploy the ArchiveBox as stack on Server B through Server A-Portainer.
Author
Owner

@pirate commented on GitHub (Oct 27, 2021):

If there's no change then it's probably not a timeout issue, the images are probably just not archivable with those methods for that particular site.

<!-- gh-comment-id:953197114 --> @pirate commented on GitHub (Oct 27, 2021): If there's no change then it's probably not a timeout issue, the images are probably just not archivable with those methods for that particular site.
Author
Owner

@Unrepentant-Atheist commented on GitHub (Oct 27, 2021):

Well.....not true... when I do wget --mirror --html-extension --no-parent --convert-links --page-requisites "url" I get everything

<!-- gh-comment-id:953223363 --> @Unrepentant-Atheist commented on GitHub (Oct 27, 2021): Well.....not true... when I do `wget --mirror --html-extension --no-parent --convert-links --page-requisites "url"` I get everything
Author
Owner

@pirate commented on GitHub (Oct 27, 2021):

Can you post the docker logs from the archiving / the output of running the wget command that archivebox runs (you can find it in the logs).

<!-- gh-comment-id:953251814 --> @pirate commented on GitHub (Oct 27, 2021): Can you post the docker logs from the archiving / the output of running the wget command that archivebox runs (you can find it in the logs).
Author
Owner

@Unrepentant-Atheist commented on GitHub (Oct 28, 2021):

https://www.toptal.com/developers/hastebin/mayutesoru.txt

<!-- gh-comment-id:953633525 --> @Unrepentant-Atheist commented on GitHub (Oct 28, 2021): https://www.toptal.com/developers/hastebin/mayutesoru.txt
Author
Owner

@Unrepentant-Atheist commented on GitHub (Nov 1, 2021):

I went to the wiki, and found this: https://github.com/gildas-lormeau/SingleFile/ , I tried this on all the archived URLs that had missing images, and every single file made with https://github.com/gildas-lormeau/SingleFile/ worked and had all the images. Maybe implement this into ArchiveBox!

<!-- gh-comment-id:956527660 --> @Unrepentant-Atheist commented on GitHub (Nov 1, 2021): I went to the wiki, and found this: https://github.com/gildas-lormeau/SingleFile/ , I tried this on all the archived URLs that had missing images, and every single file made with https://github.com/gildas-lormeau/SingleFile/ worked and had all the images. Maybe implement this into ArchiveBox!
Author
Owner

@iwconfig commented on GitHub (Dec 11, 2021):

I can confirm I have this issue as well.

Isn't SingleFile already implemented?

EDIT: Just noticed you mentioned SingleFile in the issue description. What is the difference between the ArchiveBox SingleFile and https://github.com/gildas-lormeau/SingleFile/?

<!-- gh-comment-id:991591852 --> @iwconfig commented on GitHub (Dec 11, 2021): I can confirm I have this issue as well. Isn't `SingleFile` already implemented? EDIT: Just noticed you mentioned SingleFile in the issue description. What is the difference between the ArchiveBox SingleFile and https://github.com/gildas-lormeau/SingleFile/?
Author
Owner

@pirate commented on GitHub (Dec 11, 2021):

ArchiveBox Singlefile is gildas-lormeau/SingleFile.

<!-- gh-comment-id:991795915 --> @pirate commented on GitHub (Dec 11, 2021): ArchiveBox Singlefile is gildas-lormeau/SingleFile.
Author
Owner

@iwconfig commented on GitHub (Apr 5, 2022):

Could it not be due to the fact that some images doesn't load until you scroll them into view? I've noticed that on when saving https://www.svt.se/ using obelisk that only the first few images are saved, which makes sense when inspecting the network activity while scrolling the page in the browser.

The strange thing is, when I use SingleFile in my Firefox browser, it does GET request for every image in the page (svt.se), without scrolling. It even tells you it's grabbing "deferred images". Same result in my Chromium browser.

Why doesn't SingleFile do this with the headless Chrome(ium?) instance in ArchiveBox as well? Would autoscroll fix the issue? That doesn't explain why it works in headful browsers but not in headless, though.

<!-- gh-comment-id:1088818417 --> @iwconfig commented on GitHub (Apr 5, 2022): Could it not be due to the fact that some images doesn't load until you scroll them into view? I've noticed that on when saving https://www.svt.se/ using [obelisk](https://github.com/go-shiori/obelisk) that only the first few images are saved, which makes sense when inspecting the network activity while scrolling the page in the browser. The strange thing is, when I use SingleFile in my Firefox browser, it does GET request for every image in the page ([svt.se](https://www.svt.se/)), without scrolling. It even tells you it's grabbing "deferred images". Same result in my Chromium browser. Why doesn't SingleFile do this with the headless Chrome(ium?) instance in ArchiveBox as well? Would autoscroll fix the issue? That doesn't explain why it works in headful browsers but not in headless, though.
Author
Owner

@pirate commented on GitHub (Apr 12, 2022):

Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.

<!-- gh-comment-id:1097091050 --> @pirate commented on GitHub (Apr 12, 2022): Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.
Author
Owner

@GlassedSilver commented on GitHub (May 10, 2022):

Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version.

Would it be feasible to update SingleFile from the source periodically automatically?

Same for ytdl/yt-dlp.

The releases taking their time is okay, but I'd be good to have instances pull independently of releases the tools needed.

The web is progressing faster and faster and to keep up with dependencies of these sorts is crucial in getting consistently good mirrors.

Since I run archivebox in a docker image it'd be really cool if this functionality could be baked in. :)

Another site to test this with: https://xemu.app/

<!-- gh-comment-id:1121983260 --> @GlassedSilver commented on GitHub (May 10, 2022): > Could be because we aren't using the latest version, SingleFile is adding new features all the time and we're a bit behind. The next ArchiveBox release will bump it to the latest version + latest Chrome version. Would it be feasible to update SingleFile from the source periodically automatically? Same for ytdl/yt-dlp. The releases taking their time is okay, but I'd be good to have instances pull independently of releases the tools needed. The web is progressing faster and faster and to keep up with dependencies of these sorts is crucial in getting consistently good mirrors. Since I run archivebox in a docker image it'd be really cool if this functionality could be baked in. :) Another site to test this with: https://xemu.app/
Author
Owner

@melyux commented on GitHub (Jul 15, 2023):

Please bump Singlefile, the current version is ancient. So ancient that the example SINGLEFILE_ARGS given in the config (--load-deferred-images-dispatch-scroll-event=true) doesn't event work on the version bundled because it's too old

<!-- gh-comment-id:1636754618 --> @melyux commented on GitHub (Jul 15, 2023): Please bump Singlefile, the current version is _ancient_. So ancient that the example `SINGLEFILE_ARGS` given in the config (`--load-deferred-images-dispatch-scroll-event=true`) doesn't event work on the version bundled because it's too old
Author
Owner

@pirate commented on GitHub (Aug 13, 2023):

Singlefile should already be bumped in the latest dev branch, please try that version: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

<!-- gh-comment-id:1676498529 --> @pirate commented on GitHub (Aug 13, 2023): Singlefile should already be bumped in the latest dev branch, please try that version: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch
Author
Owner

@pirate commented on GitHub (Nov 9, 2023):

Singlefile and Chrome are both on the most recent versions in ArchiveBox 0.7.1, so this should be resolved. Please comment back here if you're still having issues and I'll re-open the ticket.

<!-- gh-comment-id:1803334593 --> @pirate commented on GitHub (Nov 9, 2023): Singlefile and Chrome are both on the most recent versions in [ArchiveBox 0.7.1](https://github.com/ArchiveBox/ArchiveBox/releases), so this should be resolved. Please comment back here if you're still having issues and I'll re-open the ticket.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2057
No description provided.