[GH-ISSUE #615] Bugfix: Cookies file causes wget failures #380

Closed
opened 2026-03-01 14:43:04 +03:00 by kerem · 6 comments
Owner

Originally created by @winteriscariot on GitHub (Jan 12, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/615

Describe the bug

Steps to reproduce

On Arch Linux,

Screenshots or log output

Software versions

  • OS:
  • ArchiveBox version:
  • Python version:
  • Chrome version:
Originally created by @winteriscariot on GitHub (Jan 12, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/615 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> #### Describe the bug #### Steps to reproduce On Arch Linux, #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### Software versions - OS: - ArchiveBox version: - Python version: - Chrome version:
kerem 2026-03-01 14:43:04 +03:00
Author
Owner

@pirate commented on GitHub (Jan 13, 2021):

Please post the full output of archivebox version with no redactions.

<!-- gh-comment-id:759367220 --> @pirate commented on GitHub (Jan 13, 2021): Please post the full output of `archivebox version` with no redactions.
Author
Owner

@zapp-brannigan commented on GitHub (Jan 18, 2021):

Hi. I have the same Problem since 0.5.3:

>[+] [2021-01-18 09:45:20] "www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18"
    https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18
    > ./archive/1610963120.590836
      > title
      > favicon
      > wget
    ! Failed to archive link: Exception: Exception in archive_methods.save_wget(Link(url=https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18))
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 108, in archive_link
    result = method_function(link=link, out_dir=out_dir)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/wget.py", line 121, in save_wget
    **timer.stats,
  File "<string>", line 11, in __init__
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/schema.py", line 46, in __post_init__
    self.typecheck()
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/schema.py", line 57, in typecheck
    assert all(isinstance(arg, str) and arg for arg in self.cmd)
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 133, in main
    pwd=pwd or OUTPUT_DIR,
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 69, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/archivebox_add.py", line 93, in main
    out_dir=pwd or OUTPUT_DIR,
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/main.py", line 593, in add
    archive_links(new_links, overwrite=False, **archive_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 173, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 125, in archive_link
    )) from e
Exception: Exception in archive_methods.save_wget(Link(url=https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18))

archivebox version:

>~/archive$ archivebox version
Warning: using insecure memory!
ArchiveBox v0.5.3
Cpython Linux Linux-5.4.79-v7+-armv7l-with-debian-10.6 armv7l (not in Docker)
[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.5.3          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.7.3          valid     /usr/bin/python3.7                                                          
 √  DJANGO_BINARY         v3.1.3          valid     /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v10.21.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.1.45         valid     /usr/local/bin/single-file                                                  
 √  READABILITY_BINARY    v0.1.0          valid     /usr/local/lib/node_modules/readability-extractor/readability-extractor     
  MERCURY_BINARY        -               disabled  mercury-parser                                                              
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
  YOUTUBEDL_BINARY      -               disabled  /usr/bin/youtube-dl                                                         
 √  CHROME_BINARY         v83.0.4103.116  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 
[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/local/lib/python3.7/dist-packages/archivebox                           
 √  TEMPLATES_DIR         3 files         valid     /usr/local/lib/python3.7/dist-packages/archivebox/themes                    
[i] Secrets locations:
 CHROME_USER_DATA_DIR  -               disabled                                                                              
 COOKIES_FILE          -               disabled                                                                              
[i] Data locations:
 √  OUTPUT_DIR            11 files        valid     /home/user/archive                                                        
 √  SOURCES_DIR           136 files       valid     ./sources                                                                   
 √  LOGS_DIR              0 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           101 files       valid     ./archive                                                                   
 √  CONFIG_FILE           636.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             1000.0 KB       valid     ./index.sqlite3                                                             
<!-- gh-comment-id:762189890 --> @zapp-brannigan commented on GitHub (Jan 18, 2021): Hi. I have the same Problem since 0.5.3: ```logs >[+] [2021-01-18 09:45:20] "www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18" https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18 > ./archive/1610963120.590836 > title > favicon > wget ! Failed to archive link: Exception: Exception in archive_methods.save_wget(Link(url=https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18)) Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 108, in archive_link result = method_function(link=link, out_dir=out_dir) File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/wget.py", line 121, in save_wget **timer.stats, File "<string>", line 11, in __init__ File "/usr/local/lib/python3.7/dist-packages/archivebox/index/schema.py", line 46, in __post_init__ self.typecheck() File "/usr/local/lib/python3.7/dist-packages/archivebox/index/schema.py", line 57, in typecheck assert all(isinstance(arg, str) and arg for arg in self.cmd) AssertionError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/archivebox", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 133, in main pwd=pwd or OUTPUT_DIR, File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 69, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/archivebox_add.py", line 93, in main out_dir=pwd or OUTPUT_DIR, File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/archivebox/main.py", line 593, in add archive_links(new_links, overwrite=False, **archive_kwargs) File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 173, in archive_links archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir)) File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 112, in typechecked_function return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/archivebox/extractors/__init__.py", line 125, in archive_link )) from e Exception: Exception in archive_methods.save_wget(Link(url=https://www.poftut.com/how-to-update-upgrade-a-python-package-with-pip/#2021-01-18)) ``` archivebox version: ```logs >~/archive$ archivebox version Warning: using insecure memory! ArchiveBox v0.5.3 Cpython Linux Linux-5.4.79-v7+-armv7l-with-debian-10.6 armv7l (not in Docker) [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.5.3 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.7.3 valid /usr/bin/python3.7 √ DJANGO_BINARY v3.1.3 valid /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v10.21.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.1.45 valid /usr/local/bin/single-file √ READABILITY_BINARY v0.1.0 valid /usr/local/lib/node_modules/readability-extractor/readability-extractor MERCURY_BINARY - disabled mercury-parser √ GIT_BINARY v2.20.1 valid /usr/bin/git YOUTUBEDL_BINARY - disabled /usr/bin/youtube-dl √ CHROME_BINARY v83.0.4103.116 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/local/lib/python3.7/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/local/lib/python3.7/dist-packages/archivebox/themes [i] Secrets locations: CHROME_USER_DATA_DIR - disabled COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 11 files valid /home/user/archive √ SOURCES_DIR 136 files valid ./sources √ LOGS_DIR 0 files valid ./logs √ ARCHIVE_DIR 101 files valid ./archive √ CONFIG_FILE 636.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 1000.0 KB valid ./index.sqlite3 ```
Author
Owner

@pirate commented on GitHub (Jan 18, 2021):

Looks like you have COOKIES_FILE disabled in that output, can you post the value you're using for it and exactly how you set it. e.g. via config file, archivebox config command, env var, etc.

<!-- gh-comment-id:762239898 --> @pirate commented on GitHub (Jan 18, 2021): Looks like you have `COOKIES_FILE` disabled in that output, can you post the value you're using for it and exactly how you set it. e.g. via config file, `archivebox config` command, env var, etc.
Author
Owner

@zapp-brannigan commented on GitHub (Jan 18, 2021):

I've just commented it out in my config:

root@raspberrypi:/export/container/debian/home/user/archive# grep -i cook ArchiveBox.conf
#COOKIES_FILE = /home/user/cookies.txt
root@raspberrypi:/export/container/debian/home/user/archive# file ../cookies.txt
../cookies.txt: Netscape cookie, UTF-8 Unicode text, with very long lines
root@raspberrypi:/export/container/debian/home/user/archive#

Long time ago i used this command to enable cookies:

archivebox config --set COOKIES_FILE=/home/user/cookies.txt

<!-- gh-comment-id:762244502 --> @zapp-brannigan commented on GitHub (Jan 18, 2021): I've just commented it out in my config: >root@raspberrypi:/export/container/debian/home/user/archive# grep -i cook ArchiveBox.conf #COOKIES_FILE = /home/user/cookies.txt root@raspberrypi:/export/container/debian/home/user/archive# file ../cookies.txt ../cookies.txt: Netscape cookie, UTF-8 Unicode text, with very long lines root@raspberrypi:/export/container/debian/home/user/archive# Long time ago i used this command to enable cookies: >archivebox config --set COOKIES_FILE=/home/user/cookies.txt
Author
Owner

@pirate commented on GitHub (Jan 18, 2021):

Ah I think it's because it's read as a pathlib.Path in the config loading process, and needs to be converted to a str before being added to the wget command during the extraction process CMD = ['wget', ...].

<!-- gh-comment-id:762289594 --> @pirate commented on GitHub (Jan 18, 2021): Ah I think it's because it's read as a `pathlib.Path` in the config loading process, and needs to be converted to a `str` before being added to the wget command during the extraction process `CMD = ['wget', ...]`.
Author
Owner

@pirate commented on GitHub (Jan 21, 2021):

Fixed: ef7711f

<!-- gh-comment-id:764058114 --> @pirate commented on GitHub (Jan 21, 2021): Fixed: ef7711f
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#380
No description provided.