[GH-ISSUE #1528] v0.8.5rc0: archivebox setup in Docker fails to install some binaries #3924

Closed
opened 2026-03-15 01:00:39 +03:00 by kerem · 3 comments
Owner

Originally created by @agowa on GitHub (Oct 4, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1528

Hi, moving this into it's own issue for better tracking and to keep the other one kinda on topic.

Originally posted by @agowa in #1518

1

Hi, sorry but th dev image still doesn't work as you suggest. When I try to run it in a new and completely empty folder the init fails:

[user@PC-001 tmp]$ mkdir test
[user@PC-001 tmp]$ cd test
[user@PC-001 test]$ docker pull archivebox/archivebox:dev
(...)
Writing manifest to image destination
5876f1e823bb118732b24dedec99f3df2474d6dbb07cc3b096ce98adcd4da48d
[user@PC-001 test]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [2024-10-04 18:48:44] ArchiveBox v0.8.5: archivebox init                                                                                                                       │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[X] This folder appears to already have files in it, but no index.sqlite3 present.

    You must run init in a completely empty directory, or an existing data folder.

    Hint: To import an existing data folder make sure to cd into the folder first, 
    then run and run 'archivebox init' to pick up where you left off.

    (Always make sure your data folder is backed up first before updating ArchiveBox)
[user@PC-001 test]$ ls -la
total 28
drwxr-xr-x  5 166446 166446   120 Oct  4 20:48 .
drwxrwxrwt 32 root   root    1120 Oct  4 20:48 ..
drwxr-xr-x  2 user   users     40 Oct  4 20:48 crontabs
drwxr-xr-x  2 166446 166446    60 Oct  4 20:48 logs
-rw-r--r--  1 166446 166446 28672 Oct  4 20:48 queue.sqlite3
drwxr-xr-x  3 166446 166446    60 Oct  4 20:48 tmp
[user@PC-001 test]$ 

It however works when I first do docker run -it -v "$PWD:/data" archivebox/archivebox:latest init before running docker run -it -v "$PWD:/data" archivebox/archivebox:dev init.

Edit: After creating this ticket I noticed that I didn't run init with the --setup argument. So I now retested with it but it causes the same error and doesn't properly init.

2

docker run -it -v "$PWD:/data" archivebox/archivebox:dev version shows a few binaries as not found, missing, or invalid. namely:

  • puppeteer
  • postlight-parser
  • readability-extractor
  • single-file
  • CUSTOM_TEMPLATES_DIR
  • LIB_DIR
  • CACHE_DIR

Full output:

ArchiveBox v0.8.5 COMMIT_HASH=396a7ff BUILD_TIME=2024-10-04 11:02:49 1728039769
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.10.10-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

 Dependency versions:
 √  node                  22.9.0       apt        /usr/bin/node
 √  npm                   10.9.0       apt        /usr/bin/npm
 √  pip                   24.2.0       sys_pip    /usr/local/bin/pip
 √  python                3.12.7       sys_pip    /usr/local/bin/python3.12
 √  sqlite                2.6.0        venv_pip   /usr/local/lib/python3.12/sqlite3/dbapi2.py
 √  django                5.1.1        venv_pip   /usr/local/lib/python3.12/site-packages/django/__init__.py
 √  playwright            1.47.0       sys_pip    /usr/local/bin/playwright
 X  puppeteer             None         not found None of the configured providers  were able to load binary: puppeteer
 √  ldap                  3.4.4        venv_pip   /usr/local/lib/python3.12/site-packages/ldap/__init__.py
 √  rg                    13.0.0       apt        /usr/bin/rg
 √  sonic                 1.4.9        env        /usr/local/bin/sonic
 √  chrome                129.0.6668   env        /usr/bin/chromium-browser
 √  curl                  8.10.1       apt        /usr/bin/curl
 √  git                   2.39.5       apt        /usr/bin/git
 X  postlight-parser      None         not found None of the configured providers  were able to load binary: postlight-parser
 X  readability-extractor None         not found None of the configured providers  were able to load binary: readability-extractor
 X  single-file           None         not found None of the configured providers  were able to load binary: single-file
 √  wget                  1.21.3       apt        /usr/bin/wget
 √  yt-dlp                2024.9.27    apt        /usr/bin/yt-dlp
 √  ffmpeg                5.1.6        env        /usr/bin/ffmpeg

 Source-code locations:
 √  PACKAGE_DIR           43 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         4 files         valid     /app/archivebox/templates                                                   
 X  CUSTOM_TEMPLATES_DIR  missing         invalid   ./user_templates                       
 X  LIB_DIR               1 files         invalid   ./lib/x86_64-linux-docker              
 √  TMP_DIR               4 files         valid     /tmp/archivebox                                                             

 Data locations:
 √  DATA_DIR              12 files @      valid     /data                                                                       
 √  CONFIG_FILE           83.0 Bytes      valid     ./ArchiveBox.conf                      
 √  SQL_INDEX             428.0 KB        valid     ./index.sqlite3                        
 √  QUEUE_DATABASE        92.0 KB         valid     ./queue.sqlite3                        
 √  ARCHIVE_DIR           0 files         valid     ./archive                              
 √  SOURCES_DIR           0 files         valid     ./sources                              
 -  PERSONAS_DIR          missing         disabled  ./personas                             
 √  LOGS_DIR              1 files         valid     ./logs                                 
 X  CACHE_DIR             missing         invalid   ./cache                                
Originally created by @agowa on GitHub (Oct 4, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1528 Hi, moving this into it's own issue for better tracking and to keep the other one kinda on topic. _Originally posted by @agowa in [#1518](https://github.com/ArchiveBox/ArchiveBox/issues/1518#issuecomment-2394372856)_ ### 1 > Hi, sorry but th dev image still doesn't work as you suggest. When I try to run it in a new and completely empty folder the init fails: > > ``` > [user@PC-001 tmp]$ mkdir test > [user@PC-001 tmp]$ cd test > [user@PC-001 test]$ docker pull archivebox/archivebox:dev > (...) > Writing manifest to image destination > 5876f1e823bb118732b24dedec99f3df2474d6dbb07cc3b096ce98adcd4da48d > [user@PC-001 test]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init > ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ > │ [2024-10-04 18:48:44] ArchiveBox v0.8.5: archivebox init │ > ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ > [X] This folder appears to already have files in it, but no index.sqlite3 present. > > You must run init in a completely empty directory, or an existing data folder. > > Hint: To import an existing data folder make sure to cd into the folder first, > then run and run 'archivebox init' to pick up where you left off. > > (Always make sure your data folder is backed up first before updating ArchiveBox) > [user@PC-001 test]$ ls -la > total 28 > drwxr-xr-x 5 166446 166446 120 Oct 4 20:48 . > drwxrwxrwt 32 root root 1120 Oct 4 20:48 .. > drwxr-xr-x 2 user users 40 Oct 4 20:48 crontabs > drwxr-xr-x 2 166446 166446 60 Oct 4 20:48 logs > -rw-r--r-- 1 166446 166446 28672 Oct 4 20:48 queue.sqlite3 > drwxr-xr-x 3 166446 166446 60 Oct 4 20:48 tmp > [user@PC-001 test]$ > > ``` It however works when I first do `docker run -it -v "$PWD:/data" archivebox/archivebox:latest init` before running `docker run -it -v "$PWD:/data" archivebox/archivebox:dev init`. Edit: After creating this ticket I noticed that I didn't run init with the `--setup` argument. So I now retested with it but it causes the same error and doesn't properly init. ### 2 `docker run -it -v "$PWD:/data" archivebox/archivebox:dev version` shows a few binaries as not found, missing, or invalid. namely: * puppeteer * postlight-parser * readability-extractor * single-file * CUSTOM_TEMPLATES_DIR * LIB_DIR * CACHE_DIR Full output: ``` ArchiveBox v0.8.5 COMMIT_HASH=396a7ff BUILD_TIME=2024-10-04 11:02:49 1728039769 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.10.10-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False Dependency versions: √ node 22.9.0 apt /usr/bin/node √ npm 10.9.0 apt /usr/bin/npm √ pip 24.2.0 sys_pip /usr/local/bin/pip √ python 3.12.7 sys_pip /usr/local/bin/python3.12 √ sqlite 2.6.0 venv_pip /usr/local/lib/python3.12/sqlite3/dbapi2.py √ django 5.1.1 venv_pip /usr/local/lib/python3.12/site-packages/django/__init__.py √ playwright 1.47.0 sys_pip /usr/local/bin/playwright X puppeteer None not found None of the configured providers were able to load binary: puppeteer √ ldap 3.4.4 venv_pip /usr/local/lib/python3.12/site-packages/ldap/__init__.py √ rg 13.0.0 apt /usr/bin/rg √ sonic 1.4.9 env /usr/local/bin/sonic √ chrome 129.0.6668 env /usr/bin/chromium-browser √ curl 8.10.1 apt /usr/bin/curl √ git 2.39.5 apt /usr/bin/git X postlight-parser None not found None of the configured providers were able to load binary: postlight-parser X readability-extractor None not found None of the configured providers were able to load binary: readability-extractor X single-file None not found None of the configured providers were able to load binary: single-file √ wget 1.21.3 apt /usr/bin/wget √ yt-dlp 2024.9.27 apt /usr/bin/yt-dlp √ ffmpeg 5.1.6 env /usr/bin/ffmpeg Source-code locations: √ PACKAGE_DIR 43 files valid /app/archivebox √ TEMPLATES_DIR 4 files valid /app/archivebox/templates X CUSTOM_TEMPLATES_DIR missing invalid ./user_templates X LIB_DIR 1 files invalid ./lib/x86_64-linux-docker √ TMP_DIR 4 files valid /tmp/archivebox Data locations: √ DATA_DIR 12 files @ valid /data √ CONFIG_FILE 83.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 428.0 KB valid ./index.sqlite3 √ QUEUE_DATABASE 92.0 KB valid ./queue.sqlite3 √ ARCHIVE_DIR 0 files valid ./archive √ SOURCES_DIR 0 files valid ./sources - PERSONAS_DIR missing disabled ./personas √ LOGS_DIR 1 files valid ./logs X CACHE_DIR missing invalid ./cache ```
Author
Owner

@pirate commented on GitHub (Oct 5, 2024):

My apologies, should be fixed now: ac96cc62. Comment back if you're still having any issues.

➜ ⍴(ArchiveBox7:3.11) ~/D/data # docker run -v "$PWD":/data -it archivebox/archivebox:dev init
╭─────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [2024-10-05 04:42:59] ArchiveBox v0.8.5: archivebox init                                        │                                                                                                                                   
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
[+] Initializing a new ArchiveBox v0.8.5 collection...
----------------------------------------------------------------------

[+] Building archive folder structure...
    + ./archive, ./sources, ./logs...
    + ./ArchiveBox.conf...

[+] Building main SQL index and running initial migrations...
   ...
    Operations to perform:
      Apply all migrations: admin, api, auth, contenttypes, core, huey_monitor, machine, sessions, singlefile
    Running migrations:
    Applying contenttypes.0001_initial... OK
    Applying auth.0001_initial... OK
    ...
    Operations to perform:
      Apply all migrations: huey_monitor
    Running migrations:
    Applying huey_monitor.0001_initial... OK
    ...

    √ ./index.sqlite3

[*] Checking links from indexes and archive folders (safe to Ctrl+C)...

[*] [2024-10-05 04:43:01] Writing 0 links to main index...
    √ ./index.sqlite3

----------------------------------------------------------------------
[] Done. A new ArchiveBox collection was initialized (0 links).

    Hint: To view your archive index, run:
        archivebox server  # then visit http://127.0.0.1:8000

    To add new links, you can run:
        archivebox add < ~/some/path/to/list_of_links.txt

    For more usage and examples, run:
        archivebox help
<!-- gh-comment-id:2394924716 --> @pirate commented on GitHub (Oct 5, 2024): My apologies, should be fixed now: ac96cc62. Comment back if you're still having any issues. ```bash ➜ ⍴(ArchiveBox7:3.11) ~/D/data # docker run -v "$PWD":/data -it archivebox/archivebox:dev init ╭─────────────────────────────────────────────────────────────────────────────────────────────────╮ │ [2024-10-05 04:42:59] ArchiveBox v0.8.5: archivebox init │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────╯ [+] Initializing a new ArchiveBox v0.8.5 collection... ---------------------------------------------------------------------- [+] Building archive folder structure... + ./archive, ./sources, ./logs... + ./ArchiveBox.conf... [+] Building main SQL index and running initial migrations... ... Operations to perform: Apply all migrations: admin, api, auth, contenttypes, core, huey_monitor, machine, sessions, singlefile Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK ... Operations to perform: Apply all migrations: huey_monitor Running migrations: Applying huey_monitor.0001_initial... OK ... √ ./index.sqlite3 [*] Checking links from indexes and archive folders (safe to Ctrl+C)... [*] [2024-10-05 04:43:01] Writing 0 links to main index... √ ./index.sqlite3 ---------------------------------------------------------------------- [√] Done. A new ArchiveBox collection was initialized (0 links). Hint: To view your archive index, run: archivebox server # then visit http://127.0.0.1:8000 To add new links, you can run: archivebox add < ~/some/path/to/list_of_links.txt For more usage and examples, run: archivebox help ```
Author
Owner

@agowa commented on GitHub (Oct 5, 2024):

Sorry to disappoint, but the init still doesn't work for me, there is an issue with ffmpeg and the distutils python module.

I now have this version: ArchiveBox v0.8.5 COMMIT_HASH=beefe69 BUILD_TIME=2024-10-05 05:06:09 1728104769

And when I run init --setup (appears to work without --setup though) it fails with:

[+] Locating / Installing ffmpeg using apt or brew or env...
$ /usr/bin/apt-get update -qq
$ /usr/bin/apt-get install -y ffmpeg

E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
{
    'name': 'ffmpeg',
    'description': 'ffmpeg',
    'loaded_abspath': PosixPath('/usr/bin/ffmpeg'),
    'loaded_version': (5, 1, 6),
    'loaded_sha256': '2a20298ac77188bbc563e690ff179e6bdf1ec27db79055b8cb2a32f6b336f7cb',
    'bin_filename': 'ffmpeg',
    'is_executable': True,
    'is_script': False,
    'is_valid': True,
    'loaded_bin_dirs': {'env': '/usr/bin:/bin', 'apt': '/usr/bin:/bin'},
    'python_name': 'ffmpeg'
}

[√] Set up ArchiveBox and its dependencies successfully.
Traceback (most recent call last):
  File "/tmp/NotInsideAVenv/bin/archivebox", line 5, in <module>
    from archivebox.cli import main
  File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/archivebox/cli/__init__.py", line 11, in <module>
    from ..config import OUTPUT_DIR, check_data_folder, check_migrations
  File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/archivebox/config.py", line 33, in <module>
    import django
  File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/django/__init__.py", line 1, in <module>
    from django.utils.version import get_version
  File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/django/utils/version.py", line 6, in <module>
    from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'

    Hint: To view your archive index, run:
        archivebox server  # then visit http://127.0.0.1:8000

    To add new links, you can run:
        archivebox add < ~/some/path/to/list_of_links.txt

    For more usage and examples, run:
        archivebox help

<!-- gh-comment-id:2394948252 --> @agowa commented on GitHub (Oct 5, 2024): Sorry to disappoint, but the init still doesn't work for me, there is an issue with ffmpeg and the distutils python module. I now have this version: `ArchiveBox v0.8.5 COMMIT_HASH=beefe69 BUILD_TIME=2024-10-05 05:06:09 1728104769` And when I run `init --setup` (appears to work without `--setup` though) it fails with: ``` [+] Locating / Installing ffmpeg using apt or brew or env... $ /usr/bin/apt-get update -qq $ /usr/bin/apt-get install -y ffmpeg E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root? { 'name': 'ffmpeg', 'description': 'ffmpeg', 'loaded_abspath': PosixPath('/usr/bin/ffmpeg'), 'loaded_version': (5, 1, 6), 'loaded_sha256': '2a20298ac77188bbc563e690ff179e6bdf1ec27db79055b8cb2a32f6b336f7cb', 'bin_filename': 'ffmpeg', 'is_executable': True, 'is_script': False, 'is_valid': True, 'loaded_bin_dirs': {'env': '/usr/bin:/bin', 'apt': '/usr/bin:/bin'}, 'python_name': 'ffmpeg' } [√] Set up ArchiveBox and its dependencies successfully. Traceback (most recent call last): File "/tmp/NotInsideAVenv/bin/archivebox", line 5, in <module> from archivebox.cli import main File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/archivebox/cli/__init__.py", line 11, in <module> from ..config import OUTPUT_DIR, check_data_folder, check_migrations File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/archivebox/config.py", line 33, in <module> import django File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/django/__init__.py", line 1, in <module> from django.utils.version import get_version File "/tmp/NotInsideAVenv/lib/python3.12/site-packages/django/utils/version.py", line 6, in <module> from distutils.version import LooseVersion ModuleNotFoundError: No module named 'distutils' Hint: To view your archive index, run: archivebox server # then visit http://127.0.0.1:8000 To add new links, you can run: archivebox add < ~/some/path/to/list_of_links.txt For more usage and examples, run: archivebox help ```
Author
Owner

@pirate commented on GitHub (Oct 5, 2024):

Yup, there is no need for --setup anymore in Docker! That's one of the benefits of the new version, it comes with all the dependencies now (though it's designed such that you could still swap in your own if you prefer some other version over the bundled ones).

I will update the docs once it's released. In the meantime I've added a patch so those errors don't show in case people run it with --setup out of habit.

<!-- gh-comment-id:2394950316 --> @pirate commented on GitHub (Oct 5, 2024): Yup, there is no need for `--setup` anymore in Docker! That's one of the benefits of the new version, it comes with all the dependencies now (though it's designed such that you could still swap in your own if you prefer some other version over the bundled ones). I will update the docs once it's released. In the meantime I've added a patch so those errors don't show in case people run it with `--setup` out of habit.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3924
No description provided.