[GH-ISSUE #1518] Bug: wget fails on https://user:pass@domain/ URLs using HTTP basic auth #3920

Open
opened 2026-03-15 00:58:56 +03:00 by kerem · 9 comments
Owner

Originally created by @agowa on GitHub (Sep 22, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1518

Describe the bug

archivebox update shows

> wget
        Extractor failed:                                                                                                                           
             Wget failed or got an error from the server
        Run to see full output:

but when doing the steps to see the full output it just works. The file it tried to download was a PDF file from an webserver with http basic auth protection and the credentials being embedded into the URL.

Steps to reproduce

I just installed ArchiveBox using the docker-compose steps in the readme and when trying to capture a site and then view it, ArchiveBox first shows a page that says

Maybe this resource type is not availabe for this Snapshot,
or the archiving process has not completed yet?
# run this cmd to finish archiving this Snapshot
archivebox update -t timestamp 1727017909.005329

Then I basically ran that command (docker compose run archivebox update -t timestamp 1727017909.005329) and got this error:

[i] [2024-09-22 15:22:22] ArchiveBox v0.7.2: archivebox update -t timestamp 1727017909.005329
    > /data


[▶] [2024-09-22 15:22:24] Starting archiving of 1 snapshots in index...

[√] [2024-09-22 15:22:24] "user:pass@domain/+some/sub/dirs/filename.pdf"
    https://user:pass@domain/+some/sub/dirs/filename.pdf
    √ ./archive/1727017909.005329
      > wget
        Extractor failed:                                                                                                                           
             Wget failed or got an error from the server
        Run to see full output:
          docker run -it -v $PWD/data:/data archivebox/archivebox /bin/bash
            cd /data/archive/1727017909.005329;
            wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=/data/archive/1727017909.005329/warc/1727018544 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.3" --compression=auto "https://user:pass@domain/+some/sub/dirs/filename.pdf"

        8 files (1.2 MB) in 0:00:00s 

[√] [2024-09-22 15:22:24] Update of 1 pages complete (0.35 sec)
    - 0 links skipped
    - 1 links updated
    - 1 links had errors

but when I then want to see the full output and run these three commands I only get this:

Authentication selected: Basic realm="Restricted Files"
2024-09-22 15:30:20 URL:https://user:pass@domain/+some/sub/dirs/filename.pdf [244137/244137] -> "domain/+some/sub/dirs/filename.pdf" [1]
FINISHED --2024-09-22 15:30:20--
Total wall clock time: 0.2s
Downloaded: 1 files, 238K in 0.03s (7.83 MB/s)

And even if I check the exit code using echo $? afterwards it only returns that it was successful. Same for running it with --verbose instead of --no-verbose.

Edit: It also works when I run the provided command directly without an interactive tty attached. So that's not the issue here I think. Tested using docker run --entrypoint "" --workdir "/data/archive/1727017909.005329" -v $PWD/data:/data archivebox/archivebox wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=/data/archive/1727017909.005329/warc/1727018544 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.3" --compression=auto "https://user:pass@domain/+some/sub/dirs/filename.pdf")

Screenshots or log output

ArchiveBox version

0.7.2
ArchiveBox v0.7.2 COMMIT_HASH=315c9f3 BUILD_TIME=2024-04-24 22:47:02 1713998822
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.10.10-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.9         valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.5.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.12.2        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.46         valid     /app/node_modules/single-file-cli/single-file                               
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor               
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js                                  
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v124.0.6367.29  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        


[i] Data locations: (not in a data directory)
Originally created by @agowa on GitHub (Sep 22, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1518 #### Describe the bug archivebox update shows ``` > wget Extractor failed: Wget failed or got an error from the server Run to see full output: ``` but when doing the steps to see the full output it just works. The file it tried to download was a PDF file from an webserver with http basic auth protection and the credentials being embedded into the URL. #### Steps to reproduce I just installed ArchiveBox using the docker-compose steps in the readme and when trying to capture a site and then view it, ArchiveBox first shows a page that says ``` Maybe this resource type is not availabe for this Snapshot, or the archiving process has not completed yet? # run this cmd to finish archiving this Snapshot archivebox update -t timestamp 1727017909.005329 ``` Then I basically ran that command (`docker compose run archivebox update -t timestamp 1727017909.005329`) and got this error: ``` [i] [2024-09-22 15:22:22] ArchiveBox v0.7.2: archivebox update -t timestamp 1727017909.005329 > /data [▶] [2024-09-22 15:22:24] Starting archiving of 1 snapshots in index... [√] [2024-09-22 15:22:24] "user:pass@domain/+some/sub/dirs/filename.pdf" https://user:pass@domain/+some/sub/dirs/filename.pdf √ ./archive/1727017909.005329 > wget Extractor failed: Wget failed or got an error from the server Run to see full output: docker run -it -v $PWD/data:/data archivebox/archivebox /bin/bash cd /data/archive/1727017909.005329; wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=/data/archive/1727017909.005329/warc/1727018544 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.3" --compression=auto "https://user:pass@domain/+some/sub/dirs/filename.pdf" 8 files (1.2 MB) in 0:00:00s [√] [2024-09-22 15:22:24] Update of 1 pages complete (0.35 sec) - 0 links skipped - 1 links updated - 1 links had errors ``` but when I then want to see the full output and run these three commands I only get this: ``` Authentication selected: Basic realm="Restricted Files" 2024-09-22 15:30:20 URL:https://user:pass@domain/+some/sub/dirs/filename.pdf [244137/244137] -> "domain/+some/sub/dirs/filename.pdf" [1] FINISHED --2024-09-22 15:30:20-- Total wall clock time: 0.2s Downloaded: 1 files, 238K in 0.03s (7.83 MB/s) ``` And even if I check the exit code using `echo $?` afterwards it only returns that it was successful. Same for running it with `--verbose` instead of `--no-verbose`. Edit: It also works when I run the provided command directly without an interactive tty attached. So that's not the issue here I think. Tested using `docker run --entrypoint "" --workdir "/data/archive/1727017909.005329" -v $PWD/data:/data archivebox/archivebox wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=/data/archive/1727017909.005329/warc/1727018544 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.3" --compression=auto "https://user:pass@domain/+some/sub/dirs/filename.pdf"`) #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### ArchiveBox version ```logs 0.7.2 ArchiveBox v0.7.2 COMMIT_HASH=315c9f3 BUILD_TIME=2024-04-24 22:47:02 1713998822 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.10.10-arch1-1-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.9 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /usr/local/bin/archivebox √ CURL_BINARY v8.5.0 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.12.2 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.46 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v124.0.6367.29 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: (not in a data directory) ```
Author
Owner

@pirate commented on GitHub (Sep 22, 2024):

Can you confirm the error is repeatable if you retry it with #1 added to the end of the URL?

<!-- gh-comment-id:2366920968 --> @pirate commented on GitHub (Sep 22, 2024): Can you confirm the error is repeatable if you retry it with `#1` added to the end of the URL?
Author
Owner

@agowa commented on GitHub (Sep 22, 2024):

Yes it is also with the same URL but with a postfixed #1 I get the same error.

<!-- gh-comment-id:2366935690 --> @agowa commented on GitHub (Sep 22, 2024): Yes it is also with the same URL but with a postfixed `#1` I get the same error.
Author
Owner

@pirate commented on GitHub (Sep 22, 2024):

Ok thanks, last question: can you try it with the latest archivebox/archivebox:dev, 0.7.2 is quite old at this point and it mightve already been fixed by one of the hundreds of changes since then (in particular wget version upgrades and CLI argument requoting logic improvements).

mkdir quicktest && cd quicktest
docker run -it -v "$PWD:/data" archivebox/archivebox:dev init
docker run -it -v "$PWD:/data" archivebox/archivebox:dev add 'https://user:pass@domain/+some/sub/dirs/filename.pdf'
tree ./archive
<!-- gh-comment-id:2366936013 --> @pirate commented on GitHub (Sep 22, 2024): Ok thanks, last question: can you try it with the latest `archivebox/archivebox:dev`, 0.7.2 is quite old at this point and it mightve already been fixed by one of the hundreds of changes since then (in particular wget version upgrades and CLI argument requoting logic improvements). ```bash mkdir quicktest && cd quicktest docker run -it -v "$PWD:/data" archivebox/archivebox:dev init docker run -it -v "$PWD:/data" archivebox/archivebox:dev add 'https://user:pass@domain/+some/sub/dirs/filename.pdf' tree ./archive ```
Author
Owner

@agowa commented on GitHub (Sep 22, 2024):

sorry, but archivebox/archivebox:dev doesn't work for me at all. At first tried by replacing the :latest in the docker-compose file with :dev and after you edited your post to add the quicktest commands I also tried them. But the init fails and I only get this error:

[user@PC-001 quicktest]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init
[i] [2024-09-22 20:05:36] ArchiveBox v0.8.4: archivebox init
    > /data

Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 102, in excepthook
    Traceback.from_exception(
  File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 340, in from_exception
    rich_traceback = cls.extract(
                     ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 455, in extract
    locals={
           ^
  File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 456, in <dictcomp>
    key: pretty.traverse(
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 874, in traverse
    node = _traverse(_object, root=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 667, in _traverse
    args = list(iter_rich_args(rich_repr_result))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 634, in iter_rich_args
    for arg in rich_args:
  File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_repr.py", line 73, in __rich_repr__
    for name, field_repr in self.__repr_args__():
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1069, in __repr_args__
    yield from ((k, getattr(self, k)) for k, v in self.model_computed_fields.items() if v.repr)
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1069, in <genexpr>
    yield from ((k, getattr(self, k)) for k, v in self.model_computed_fields.items() if v.repr)
                    ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider.py", line 176, in __getattr__
    return super().__getattr__(item)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 853, in __getattr__
    return super().__getattribute__(item)  # Raises AttributeError if appropriate
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 46, in INSTALLER_BIN_ABSPATH
    abspath = (bin_abspath(self.INSTALLER_BIN, PATH=None) or shutil.which(self.INSTALLER_BIN)).resolve()  # find self.INSTALLER_BIN abspath using environment path
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'resolve'

Original exception was:
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 181, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 112, in run_subcommand
    setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending)
  File "/app/archivebox/config.py", line 1514, in setup_django
    django.setup()
  File "/usr/local/lib/python3.11/site-packages/django/__init__.py", line 24, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/usr/local/lib/python3.11/site-packages/django/apps/registry.py", line 91, in populate
    app_config = AppConfig.create(entry)
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/apps/config.py", line 123, in create
    mod = import_module(mod_path)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/archivebox/builtin_plugins/chrome/apps.py", line 27, in <module>
    from builtin_plugins.playwright.apps import PLAYWRIGHT_BINPROVIDER
  File "/app/archivebox/builtin_plugins/playwright/apps.py", line 31, in <module>
    from builtin_plugins.pip.apps import SYS_PIP_BINPROVIDER, VENV_PIP_BINPROVIDER, LIB_PIP_BINPROVIDER
  File "/app/archivebox/builtin_plugins/pip/apps.py", line 71, in <module>
    PIPX_PIP_BINPROVIDER = SystemPipxBinProvider()
                           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 64, in load_PATH_from_pip_sitepackages
    if self.INSTALLER_BIN_ABSPATH and shutil.which(self.INSTALLER_BIN_ABSPATH):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider.py", line 176, in __getattr__
    return super().__getattr__(item)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 853, in __getattr__
    return super().__getattribute__(item)  # Raises AttributeError if appropriate
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 46, in INSTALLER_BIN_ABSPATH
    abspath = (bin_abspath(self.INSTALLER_BIN, PATH=None) or shutil.which(self.INSTALLER_BIN)).resolve()  # find self.INSTALLER_BIN abspath using environment path
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'resolve'
<!-- gh-comment-id:2366943815 --> @agowa commented on GitHub (Sep 22, 2024): sorry, but archivebox/archivebox:dev doesn't work for me at all. At first tried by replacing the `:latest` in the docker-compose file with `:dev` and after you edited your post to add the quicktest commands I also tried them. But the init fails and I only get this error: ``` [user@PC-001 quicktest]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init [i] [2024-09-22 20:05:36] ArchiveBox v0.8.4: archivebox init > /data Error in sys.excepthook: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 102, in excepthook Traceback.from_exception( File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 340, in from_exception rich_traceback = cls.extract( ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 455, in extract locals={ ^ File "/usr/local/lib/python3.11/site-packages/rich/traceback.py", line 456, in <dictcomp> key: pretty.traverse( ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 874, in traverse node = _traverse(_object, root=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 667, in _traverse args = list(iter_rich_args(rich_repr_result)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/rich/pretty.py", line 634, in iter_rich_args for arg in rich_args: File "/usr/local/lib/python3.11/site-packages/pydantic/_internal/_repr.py", line 73, in __rich_repr__ for name, field_repr in self.__repr_args__(): File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1069, in __repr_args__ yield from ((k, getattr(self, k)) for k, v in self.model_computed_fields.items() if v.repr) File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 1069, in <genexpr> yield from ((k, getattr(self, k)) for k, v in self.model_computed_fields.items() if v.repr) ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider.py", line 176, in __getattr__ return super().__getattr__(item) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 853, in __getattr__ return super().__getattribute__(item) # Raises AttributeError if appropriate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 46, in INSTALLER_BIN_ABSPATH abspath = (bin_abspath(self.INSTALLER_BIN, PATH=None) or shutil.which(self.INSTALLER_BIN)).resolve() # find self.INSTALLER_BIN abspath using environment path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'resolve' Original exception was: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/app/archivebox/cli/__init__.py", line 181, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 112, in run_subcommand setup_django(in_memory_db=subcommand in fake_db, check_db=cmd_requires_db and not init_pending) File "/app/archivebox/config.py", line 1514, in setup_django django.setup() File "/usr/local/lib/python3.11/site-packages/django/__init__.py", line 24, in setup apps.populate(settings.INSTALLED_APPS) File "/usr/local/lib/python3.11/site-packages/django/apps/registry.py", line 91, in populate app_config = AppConfig.create(entry) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/django/apps/config.py", line 123, in create mod = import_module(mod_path) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1204, in _gcd_import File "<frozen importlib._bootstrap>", line 1176, in _find_and_load File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/app/archivebox/builtin_plugins/chrome/apps.py", line 27, in <module> from builtin_plugins.playwright.apps import PLAYWRIGHT_BINPROVIDER File "/app/archivebox/builtin_plugins/playwright/apps.py", line 31, in <module> from builtin_plugins.pip.apps import SYS_PIP_BINPROVIDER, VENV_PIP_BINPROVIDER, LIB_PIP_BINPROVIDER File "/app/archivebox/builtin_plugins/pip/apps.py", line 71, in <module> PIPX_PIP_BINPROVIDER = SystemPipxBinProvider() ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 212, in __init__ validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 64, in load_PATH_from_pip_sitepackages if self.INSTALLER_BIN_ABSPATH and shutil.which(self.INSTALLER_BIN_ABSPATH): ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider.py", line 176, in __getattr__ return super().__getattr__(item) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 853, in __getattr__ return super().__getattribute__(item) # Raises AttributeError if appropriate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/pydantic_pkgr/binprovider_pip.py", line 46, in INSTALLER_BIN_ABSPATH abspath = (bin_abspath(self.INSTALLER_BIN, PATH=None) or shutil.which(self.INSTALLER_BIN)).resolve() # find self.INSTALLER_BIN abspath using environment path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'resolve' ```
Author
Owner

@pirate commented on GitHub (Sep 22, 2024):

Ah sorry looks like the build I started last night before I went to bed never finished, give me a sec I'll fix it.

<!-- gh-comment-id:2366947755 --> @pirate commented on GitHub (Sep 22, 2024): Ah sorry looks like the build I started last night before I went to bed never finished, give me a sec I'll fix it.
Author
Owner

@agowa commented on GitHub (Oct 3, 2024):

@pirate any update on this one? Were you able to replicate the issue?

<!-- gh-comment-id:2390617570 --> @agowa commented on GitHub (Oct 3, 2024): @pirate any update on this one? Were you able to replicate the issue?
Author
Owner

@pirate commented on GitHub (Oct 3, 2024):

Build is fixed, but I'm not sure if the original issue is, you can give it a try:

docker pull archivebox/archivebox:dev

# remember to back up your data dir first! this is a BETA and data loss may occur
docker run -it -v "$PWD:/data" archivebox/archivebox:dev init
docker run -it -v "$PWD:/data" archivebox/archivebox:dev version
docker run -it -v "$PWD:/data" archivebox/archivebox:dev status
docker run -it -v "$PWD:/data" archivebox/archivebox:dev add 'https://user:pass@domain/+some/sub/dirs/filename.pdf'
<!-- gh-comment-id:2391180248 --> @pirate commented on GitHub (Oct 3, 2024): Build is fixed, but I'm not sure if the original issue is, you can give it a try: ```bash docker pull archivebox/archivebox:dev # remember to back up your data dir first! this is a BETA and data loss may occur docker run -it -v "$PWD:/data" archivebox/archivebox:dev init docker run -it -v "$PWD:/data" archivebox/archivebox:dev version docker run -it -v "$PWD:/data" archivebox/archivebox:dev status docker run -it -v "$PWD:/data" archivebox/archivebox:dev add 'https://user:pass@domain/+some/sub/dirs/filename.pdf' ```
Author
Owner

@agowa commented on GitHub (Oct 4, 2024):

Hi, sorry but th dev image still doesn't work as you suggest. When I try to run it in a new and completely empty folder the init fails:

[user@PC-001 tmp]$ mkdir test
[user@PC-001 tmp]$ cd test
[user@PC-001 test]$ docker pull archivebox/archivebox:dev
(...)
Writing manifest to image destination
5876f1e823bb118732b24dedec99f3df2474d6dbb07cc3b096ce98adcd4da48d
[user@PC-001 test]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [2024-10-04 18:48:44] ArchiveBox v0.8.5: archivebox init                                                                                                                       │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[X] This folder appears to already have files in it, but no index.sqlite3 present.

    You must run init in a completely empty directory, or an existing data folder.

    Hint: To import an existing data folder make sure to cd into the folder first, 
    then run and run 'archivebox init' to pick up where you left off.

    (Always make sure your data folder is backed up first before updating ArchiveBox)
[user@PC-001 test]$ ls -la
total 28
drwxr-xr-x  5 166446 166446   120 Oct  4 20:48 .
drwxrwxrwt 32 root   root    1120 Oct  4 20:48 ..
drwxr-xr-x  2 user   users     40 Oct  4 20:48 crontabs
drwxr-xr-x  2 166446 166446    60 Oct  4 20:48 logs
-rw-r--r--  1 166446 166446 28672 Oct  4 20:48 queue.sqlite3
drwxr-xr-x  3 166446 166446    60 Oct  4 20:48 tmp
[user@PC-001 test]$ 

<!-- gh-comment-id:2394372856 --> @agowa commented on GitHub (Oct 4, 2024): Hi, sorry but th dev image still doesn't work as you suggest. When I try to run it in a new and completely empty folder the init fails: ``` [user@PC-001 tmp]$ mkdir test [user@PC-001 tmp]$ cd test [user@PC-001 test]$ docker pull archivebox/archivebox:dev (...) Writing manifest to image destination 5876f1e823bb118732b24dedec99f3df2474d6dbb07cc3b096ce98adcd4da48d [user@PC-001 test]$ docker run -it -v "$PWD:/data" archivebox/archivebox:dev init ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ [2024-10-04 18:48:44] ArchiveBox v0.8.5: archivebox init │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ [X] This folder appears to already have files in it, but no index.sqlite3 present. You must run init in a completely empty directory, or an existing data folder. Hint: To import an existing data folder make sure to cd into the folder first, then run and run 'archivebox init' to pick up where you left off. (Always make sure your data folder is backed up first before updating ArchiveBox) [user@PC-001 test]$ ls -la total 28 drwxr-xr-x 5 166446 166446 120 Oct 4 20:48 . drwxrwxrwt 32 root root 1120 Oct 4 20:48 .. drwxr-xr-x 2 user users 40 Oct 4 20:48 crontabs drwxr-xr-x 2 166446 166446 60 Oct 4 20:48 logs -rw-r--r-- 1 166446 166446 28672 Oct 4 20:48 queue.sqlite3 drwxr-xr-x 3 166446 166446 60 Oct 4 20:48 tmp [user@PC-001 test]$ ```
Author
Owner

@agowa commented on GitHub (Oct 4, 2024):

I moved the issues with spinning up the dev image into a separate issue as I found a workaround by first doing the init using the latest image.

Regarding this issue you're right it still exists. the "add" command still claims "Wget failed or got an error from the server" but the also provided "full output" command succeeds.

<!-- gh-comment-id:2394749578 --> @agowa commented on GitHub (Oct 4, 2024): I moved the issues with spinning up the dev image into a separate issue as I found a workaround by first doing the init using the latest image. Regarding this issue you're right it still exists. the "add" command still claims "Wget failed or got an error from the server" but the also provided "full output" command succeeds.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3920
No description provided.