[GH-ISSUE #1109] ArchiveBox Docker image is very out-of-date #2207

Closed
opened 2026-03-01 17:57:17 +03:00 by kerem · 11 comments
Owner

Originally created by @sergeyvolk on GitHub (Feb 28, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1109

I just installed ArchiveBox today. Since https://archivebox.io/#quickstart recommended installing via Docker I went with that.
I've installed Docker on my Ubuntu 22.10 and ran ArchiveBox install script (curl -sSL 'https://get.archivebox.io' | sh). The installation was successful, but then I tried archiving a few URLs and noticed that media downloads failed for all YouTube URLs. I looked a bit into it, and was surprised to see that the Docker image pulled by the installation script is ~2 years old, it has youtube-dl v2021.04.26, which doesn't work anymore.
Is Docker install still the recommended way to set up ArchiveBox? If it is, then perhaps Docker images need to be updated on a regular basis?

Originally created by @sergeyvolk on GitHub (Feb 28, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1109 I just installed ArchiveBox today. Since https://archivebox.io/#quickstart recommended installing via Docker I went with that. I've installed Docker on my Ubuntu 22.10 and ran ArchiveBox install script (curl -sSL 'https://get.archivebox.io' | sh). The installation was successful, but then I tried archiving a few URLs and noticed that media downloads failed for all YouTube URLs. I looked a bit into it, and was surprised to see that the Docker image pulled by the installation script is ~2 years old, it has youtube-dl v2021.04.26, which doesn't work anymore. Is Docker install still the recommended way to set up ArchiveBox? If it is, then perhaps Docker images need to be updated on a regular basis?
kerem closed this issue 2026-03-01 17:57:17 +03:00
Author
Owner

@sergeyvolk commented on GitHub (Feb 28, 2023):

Btw, one interesting thing that I've just noticed is that Docker Desktop shows two archivebox images. The first one is ~2 years old, has Tag='master' and hash value a8da1f0258b3f7d8f01948a458ba3f9e8a71a85a8763a2a144413da8e55bb519. The second one was created 10 months ago, has Tag='latest' and hash d74dfbfd46b85f7e1b503606fb30b2c2e7d8093285046b48b602080fc303d276. For some reason the first image (the one that's ~2 years old) is being used by the archivebox instance on my machine. I'm not sure why the installer script set things up like that. But even if I find a way to switch to the docker image marked 'latest' - if it really is 10 months old, I guess things might be a little better, but I think youtube-dl will still be out of date.

<!-- gh-comment-id:1447511336 --> @sergeyvolk commented on GitHub (Feb 28, 2023): Btw, one interesting thing that I've just noticed is that Docker Desktop shows two archivebox images. The first one is ~2 years old, has Tag='master' and hash value a8da1f0258b3f7d8f01948a458ba3f9e8a71a85a8763a2a144413da8e55bb519. The second one was created 10 months ago, has Tag='latest' and hash d74dfbfd46b85f7e1b503606fb30b2c2e7d8093285046b48b602080fc303d276. For some reason the first image (the one that's ~2 years old) is being used by the archivebox instance on my machine. I'm not sure why the installer script set things up like that. But even if I find a way to switch to the docker image marked 'latest' - if it really is 10 months old, I guess things might be a little better, but I think youtube-dl will still be out of date.
Author
Owner

@sergeyvolk commented on GitHub (Feb 28, 2023):

One more note: I installed Docker Desktop on my Ubuntu, so I believe the ArchiveBox installation script used the docker-compose path.

<!-- gh-comment-id:1447512108 --> @sergeyvolk commented on GitHub (Feb 28, 2023): One more note: I installed Docker Desktop on my Ubuntu, so I believe the ArchiveBox installation script used the docker-compose path.
Author
Owner

@pirate commented on GitHub (Feb 28, 2023):

This is correct, the official stable master release has not been updated in a while, however work has continued in the dev branch over the last year, I just haven't had a time to fully test everything to roll an official release so I've been pointing users to the unstable branch gradually to make sure it's stable. If you'd like to try the unstable dev branch with the latest youtube-dl and yt-dlp versions and other fixes, you can set your image to archivebox/archivebox:dev https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch.

<!-- gh-comment-id:1448591345 --> @pirate commented on GitHub (Feb 28, 2023): This is correct, the official stable `master` release has not been updated in a while, however work has continued in the `dev` branch over the last year, I just haven't had a time to fully test everything to roll an official release so I've been pointing users to the unstable branch gradually to make sure it's stable. If you'd like to try the unstable dev branch with the latest `youtube-dl` and `yt-dlp` versions and other fixes, you can set your image to `archivebox/archivebox:dev` https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch.
Author
Owner

@neingeist commented on GitHub (Mar 22, 2023):

I've hit this too, and used latest for my first tests (which seems only a few commits ahead of the latest versioned tag).

I've seen the "latest" Docker tag mostly used as the latest stable release, while "edge" or "master"/"main" is common for the dev or "master"/"main" branches.

<!-- gh-comment-id:1480372682 --> @neingeist commented on GitHub (Mar 22, 2023): I've hit this too, and used `latest` for my first tests (which seems only a few commits ahead of the latest versioned tag). I've seen the "latest" Docker tag mostly used as the latest stable release, while "edge" or "master"/"main" is common for the dev or "master"/"main" branches.
Author
Owner

@neingeist commented on GitHub (Mar 22, 2023):

(dev works much better)

<!-- gh-comment-id:1480393094 --> @neingeist commented on GitHub (Mar 22, 2023): (`dev` works much better)
Author
Owner

@hbd commented on GitHub (Apr 18, 2023):

Same here - ran into an issue with youtube-dl in master (default when setting up with the docker-compose or curl methods), realized the image tagged master in Dockerhub is very old, updated to dev and it's working beautifully. @pirate how can we help test?

<!-- gh-comment-id:1513766179 --> @hbd commented on GitHub (Apr 18, 2023): Same here - ran into an issue with youtube-dl in `master` (default when setting up with the docker-compose or curl methods), realized the image tagged master in Dockerhub is very old, updated to `dev` and it's working beautifully. @pirate how can we help test?
Author
Owner

@pirate commented on GitHub (Jun 13, 2023):

Going to close this for the same reason as https://github.com/ArchiveBox/ArchiveBox/issues/1132. I am plenty aware that the Docker :master image should be bumped 💜

I am working through a backlog of issues that need to be fixed before I am ready to roll the next "stable" release. I understand this is annoying for users accustomed to the much more rapid monthly release cycle of other homelab apps like Plex, Nextcloud, etc. (as I would be too!).

Because there are a bunch of slow-moving orgs that depend on the bare-metal (pypi/brew/pkg) ArchiveBox releases being non-breaking, I would rather direct users expecting faster bugfixes to the Docker archivebox/archivebox:dev image (until I am ready to roll a new stable release). The bare-metal users are able to upgrade their extractor dependencies independently, and the Docker users are able to use :dev which bundles the most recent dependencies, so most users have a reasonably accessible solution to running up-to-date ArchiveBox if they need it urgently.

I'm still just a solo dev on this project, my time budget for ArchiveBox has been 1~2 days a month for the last 1.5yr, limiting me to mostly answering support requests, improving docs, reviewing PRs, and pushing security fixes. The most recent release delay is partly a function of me accepting too many feature requests (increasing the surface area that I have to bugfix), and partly a function of ArchiveBox's core design philosophy which depends on a lot of external extractor tools that change often.

I have started to devote more dedicated energy to ArchiveBox in the last month again, but it will take me some time to ramp up and work through the backlog and prepare a new stable Docker release.

In the meantime please check out the most recently updated Github Issues / Discussions, and @ArchiveBoxApp Twitter for updates.

<!-- gh-comment-id:1589309686 --> @pirate commented on GitHub (Jun 13, 2023): Going to close this for the same reason as https://github.com/ArchiveBox/ArchiveBox/issues/1132. I am plenty aware that the Docker `:master` image should be bumped 💜 I am working through a backlog of issues that need to be fixed before I am ready to roll the next "stable" release. I understand this is annoying for users accustomed to the much more rapid monthly release cycle of other homelab apps like Plex, Nextcloud, etc. (as I would be too!). Because there are a bunch of slow-moving orgs that depend on the bare-metal (pypi/brew/pkg) ArchiveBox releases being non-breaking, I would rather direct users expecting faster bugfixes to the Docker `archivebox/archivebox:dev` image (until I am ready to roll a new stable release). The bare-metal users are able to upgrade their extractor dependencies independently, and the Docker users are able to use `:dev` which bundles the most recent dependencies, so most users have a reasonably accessible solution to running up-to-date ArchiveBox if they need it urgently. I'm still just a solo dev on this project, my time budget for ArchiveBox has been 1~2 days a month for the last 1.5yr, limiting me to mostly answering support requests, improving docs, reviewing PRs, and pushing security fixes. The most recent release delay is partly a function of me accepting too many feature requests (increasing the surface area that I have to bugfix), and partly a function of ArchiveBox's core design philosophy which depends on a lot of external extractor tools that change often. I have started to devote more dedicated energy to ArchiveBox in the last month again, but it will take me some time to ramp up and work through the backlog and prepare a new stable Docker release. In the meantime please check out the [most recently updated Github Issues](https://github.com/ArchiveBox/ArchiveBox/issues?q=is%3Aopen+is%3Aissue) / [Discussions](https://github.com/ArchiveBox/ArchiveBox/discussions), and [@ArchiveBoxApp Twitter](https://twitter.com/ArchiveBoxApp) for updates.
Author
Owner

@madobet commented on GitHub (Nov 17, 2025):

Just left some comments: not only the docker version, but also the bare metal version. I've checked the "auto" install script and PPA user repo, which was still using Ubuntu 20.04 and the other one updated at 2021, makes the script neither "auto" nor "easy" :P

<!-- gh-comment-id:3541870366 --> @madobet commented on GitHub (Nov 17, 2025): Just left some comments: not only the docker version, but also the bare metal version. I've checked the "auto" install script and PPA user repo, which was still using Ubuntu 20.04 and the other one updated at 2021, makes the script neither "auto" nor "easy" :P
Author
Owner

@pirate commented on GitHub (Nov 18, 2025):

PPA was discontinued a while ago. Most recent version is on pip or via github dev branch for the last beta.

<!-- gh-comment-id:3545738742 --> @pirate commented on GitHub (Nov 18, 2025): PPA was discontinued a while ago. Most recent version is on pip or via github dev branch for the last beta.
Author
Owner

@madobet commented on GitHub (Nov 19, 2025):

PPA was discontinued a while ago. Most recent version is on pip or via github dev branch for the last beta.

Thanks, I will try it :-)

<!-- gh-comment-id:3550136607 --> @madobet commented on GitHub (Nov 19, 2025): > PPA was discontinued a while ago. Most recent version is on pip or via github dev branch for the last beta. Thanks, I will try it :-)
Author
Owner

@pirate commented on GitHub (Nov 19, 2025):

Also try Browsertrix-crawler, it's a good alternative

<!-- gh-comment-id:3554025540 --> @pirate commented on GitHub (Nov 19, 2025): Also try Browsertrix-crawler, it's a good alternative
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2207
No description provided.