[GH-ISSUE #196] Archive Method: Wget: unrecognized option '--compression=auto' #135

Closed
opened 2026-03-01 14:40:53 +03:00 by kerem · 6 comments
Owner

Originally created by @mawmawmawm on GitHub (Mar 27, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/196

Hey there, I just pulled the master and poked around and then I realized that the wget command isn't working correctly. Running via docker-compose.

Describe the bug

wget is being executed with the following standard parameters:

wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent --compression=auto -e robots=off --restrict-file-names=unix --timeout=60 --warc-file=warc/1553663964 --page-requisites "--user-agent=ArchiveBox/58c9b47d4 (+https://github.com/pirate/ArchiveBox/) wget/1.18" https://www.theatlantic.com/health/archive/2014/05/why-medicine-is-cheaper-in-germany/371418/

If I run this outside of the script, I get a wget: unrecognized option '--compression=auto' which makes wget basically stopp working.

Steps to reproduce

Run the above mentioned wget command

My workaround was to edit archivebox/archive_methods.py manually inside the container and comment out the compression=auto there.

Originally created by @mawmawmawm on GitHub (Mar 27, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/196 Hey there, I just pulled the master and poked around and then I realized that the wget command isn't working correctly. Running via docker-compose. ## Describe the bug wget is being executed with the following standard parameters: ``` wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent --compression=auto -e robots=off --restrict-file-names=unix --timeout=60 --warc-file=warc/1553663964 --page-requisites "--user-agent=ArchiveBox/58c9b47d4 (+https://github.com/pirate/ArchiveBox/) wget/1.18" https://www.theatlantic.com/health/archive/2014/05/why-medicine-is-cheaper-in-germany/371418/ ``` If I run this outside of the script, I get a `wget: unrecognized option '--compression=auto'` which makes wget basically stopp working. ## Steps to reproduce Run the above mentioned wget command My workaround was to edit `archivebox/archive_methods.py` manually inside the container and comment out the `compression=auto` there.
kerem 2026-03-01 14:40:53 +03:00
Author
Owner

@pirate commented on GitHub (Mar 27, 2019):

This was just fixed, can you pull and try again.

<!-- gh-comment-id:477045434 --> @pirate commented on GitHub (Mar 27, 2019): This was just fixed, can you pull and try again.
Author
Owner

@go2tom42 commented on GitHub (Mar 30, 2019):

I just pulled it and get the same error

<!-- gh-comment-id:478287793 --> @go2tom42 commented on GitHub (Mar 30, 2019): I just pulled it and get the same error
Author
Owner

@pirate commented on GitHub (Mar 30, 2019):

@go2tom42 strange, can you give me this info and I'll debug it:

  • current git commit you're on
  • the output of wget --compression=auto --help on your machine (specifically I need the status code, whether it's 0 or something else)
  • if possible, run this in a python shell as well after cd ArchiveBox/archivebox;:
>>>from .config import WGET_AUTO_COMPRESSION
>>>print(WGET_AUTO_COMPRESSION)
True
>>>from .util import run
>>>run(["wget", "--compression=auto", "--help"], stdout=DEVNULL).returncode
0
<!-- gh-comment-id:478289537 --> @pirate commented on GitHub (Mar 30, 2019): @go2tom42 strange, can you give me this info and I'll debug it: - current git commit you're on - the output of `wget --compression=auto --help` on your machine (specifically I need the status code, whether it's 0 or something else) - if possible, run this in a `python` shell as well after `cd ArchiveBox/archivebox;`: ```python >>>from .config import WGET_AUTO_COMPRESSION >>>print(WGET_AUTO_COMPRESSION) True ``` ```python >>>from .util import run >>>run(["wget", "--compression=auto", "--help"], stdout=DEVNULL).returncode 0 ```
Author
Owner

@go2tom42 commented on GitHub (Mar 30, 2019):

I'm using the image from https://hub.docker.com/r/nikisweeting/archivebox

Output from wget --compression=auto --help inside container

$ wget --compression=auto --help
wget: unrecognized option '--compression=auto'
Usage: wget [OPTION]... [URL]...

Try 'wget --help' for more options.
$

From location home/pptruser/app/archivebox ran

python3
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from .config import WGET_AUTO_COMPRESSION
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Parent module '' not loaded, cannot perform relative import

from .util import run give same result as above

The wget inside the container is version 1.18, I believe compression was added with version 1.19.2 https://lists.gnu.org/archive/html/info-gnu/2017-10/msg00007.html

<!-- gh-comment-id:478293768 --> @go2tom42 commented on GitHub (Mar 30, 2019): I'm using the image from https://hub.docker.com/r/nikisweeting/archivebox Output from `wget --compression=auto --help` inside container ``` $ wget --compression=auto --help wget: unrecognized option '--compression=auto' Usage: wget [OPTION]... [URL]... Try 'wget --help' for more options. $ ``` From location `home/pptruser/app/archivebox` ran ```python python3 Python 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from .config import WGET_AUTO_COMPRESSION Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: Parent module '' not loaded, cannot perform relative import ``` `from .util import run` give same result as above The wget inside the container is version 1.18, I believe compression was added with version 1.19.2 https://lists.gnu.org/archive/html/info-gnu/2017-10/msg00007.html
Author
Owner

@bradparks commented on GitHub (Mar 31, 2019):

I get this error as well, and I just cloned this now, March 30 at 9:12 PM AST

<!-- gh-comment-id:478300280 --> @bradparks commented on GitHub (Mar 31, 2019): I get this error as well, and I just cloned this now, March 30 at 9:12 PM AST
Author
Owner

@pirate commented on GitHub (Jul 24, 2020):

This should be fixed on the latest django branch. If you're still seeing any issues comment back here and I'll reopen the ticket.

git checkout django
git pull
docker build . -t archivebox
docker run -v $PWD/output:/data archivebox init
docker run -v $PWD/output:/data archivebox add 'https://example.com'
<!-- gh-comment-id:663635547 --> @pirate commented on GitHub (Jul 24, 2020): This should be fixed on the latest django branch. If you're still seeing any issues comment back here and I'll reopen the ticket. ```bash git checkout django git pull docker build . -t archivebox docker run -v $PWD/output:/data archivebox init docker run -v $PWD/output:/data archivebox add 'https://example.com' ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#135
No description provided.