[GH-ISSUE #1165] Bug: Setting up via pip on Windows 10 results in strange behavior, calls to os.getpgid, failure to add page to archive #723

Closed
opened 2026-03-01 14:45:50 +03:00 by kerem · 2 comments
Owner

Originally created by @pineapplemachine on GitHub (Jun 24, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1165

Describe the bug

I newly installed archivebox on Windows 10 IoT Enterprise LTSC 21H2 using pip and python 3.

See below for log output, including python, pip, and archivebox version information.

After installing the pip package and running archivebox init in an empty directory, archivebox add https://archivebox.io/ logged platform-related errors, with some extractors attempting to call os.getpgid which is not available on Windows.

I tried running archivebox server 0.0.0.0:8001 and then attempted to view the archived page in Firefox. A link to the archived page was visible at http://localhost:8001/public/, but when I clicked the link I was shown this error:

Snapshot [1687594625.565415] exists in DB, but resource 1687594625.565415/singlefile.html does not exist in snapshot dir yet.

Maybe this resource type is not availabe for this Snapshot, or the archiving process has not completed yet?

# run this cmd to finish archiving this Snapshot
archivebox update -t timestamp 1687594625.565415

The archivebox homepage suggests that archivebox can be installed via pip on Windows:

image

Steps to reproduce

See above.

ArchiveBox version

See below.

Screenshots or log output

Here is a complete CLI log of installing the pip package, setting up archivebox, and attempting to add and access an archived page. See the end of the log for version information for python, pip, and archivebox.

E:\Sync\Records
λ pip install archivebox
Collecting archivebox
  Downloading archivebox-0.6.2-py3-none-any.whl (489 kB)
     --------------------------------- 489.2/489.2 kB 5.1 MB/s eta 0:00:00
Requirement already satisfied: requests>=2.24.0 in c:\python\python-3.10.2\lib\site-packages (from archivebox) (2.27.1)
Requirement already satisfied: mypy-extensions>=0.4.3 in c:\python\python-3.10.2\lib\site-packages (from archivebox) (1.0.0)
Collecting django<3.2,>=3.1.3 (from archivebox)
  Downloading Django-3.1.14-py3-none-any.whl (7.8 MB)
     ------------------------------------- 7.8/7.8 MB 4.9 MB/s eta 0:00:00
Collecting django-extensions>=3.0.3 (from archivebox)
  Downloading django_extensions-3.2.3-py3-none-any.whl (229 kB)
     --------------------------------- 229.9/229.9 kB 4.7 MB/s eta 0:00:00
Collecting dateparser (from archivebox)
  Downloading dateparser-1.1.8-py2.py3-none-any.whl (293 kB)
     --------------------------------- 293.8/293.8 kB 4.6 MB/s eta 0:00:00
Collecting ipython (from archivebox)
  Downloading ipython-8.14.0-py3-none-any.whl (798 kB)
     --------------------------------- 798.7/798.7 kB 5.1 MB/s eta 0:00:00
Requirement already satisfied: youtube-dl in c:\python\python-3.10.2\lib\site-packages (from archivebox) (2021.12.17)
Collecting python-crontab>=2.5.1 (from archivebox)
  Downloading python_crontab-2.7.1-py3-none-any.whl (26 kB)
Collecting croniter>=0.3.34 (from archivebox)
  Downloading croniter-1.4.1-py2.py3-none-any.whl (19 kB)
Collecting w3lib>=1.22.0 (from archivebox)
  Downloading w3lib-2.1.1-py3-none-any.whl (21 kB)
Requirement already satisfied: python-dateutil in c:\python\python-3.10.2\lib\site-packages (from croniter>=0.3.34->archivebox) (2.8.2)
Collecting asgiref<4,>=3.2.10 (from django<3.2,>=3.1.3->archivebox)
  Downloading asgiref-3.7.2-py3-none-any.whl (24 kB)
Requirement already satisfied: pytz in c:\python\python-3.10.2\lib\site-packages (from django<3.2,>=3.1.3->archivebox) (2021.3)
Collecting sqlparse>=0.2.2 (from django<3.2,>=3.1.3->archivebox)
  Downloading sqlparse-0.4.4-py3-none-any.whl (41 kB)
     ---------------------------------------- 41.2/41.2 kB ? eta 0:00:00
INFO: pip is looking at multiple versions of django-extensions to determine which version is compatible with other requirements. This could take a while.
Collecting django-extensions>=3.0.3 (from archivebox)
  Downloading django_extensions-3.2.1-py3-none-any.whl (229 kB)
     --------------------------------- 229.4/229.4 kB 3.5 MB/s eta 0:00:00
  Downloading django_extensions-3.2.0-py3-none-any.whl (229 kB)
     --------------------------------- 229.1/229.1 kB 4.7 MB/s eta 0:00:00
  Downloading django_extensions-3.1.5-py3-none-any.whl (224 kB)
     --------------------------------- 224.2/224.2 kB 4.6 MB/s eta 0:00:00
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (3.3)
Collecting regex!=2019.02.19,!=2021.8.27 (from dateparser->archivebox)
  Downloading regex-2023.6.3-cp310-cp310-win_amd64.whl (268 kB)
     --------------------------------- 268.0/268.0 kB 4.1 MB/s eta 0:00:00
Collecting tzlocal (from dateparser->archivebox)
  Downloading tzlocal-5.0.1-py3-none-any.whl (20 kB)
Collecting backcall (from ipython->archivebox)
  Downloading backcall-0.2.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: decorator in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (4.4.2)
Collecting jedi>=0.16 (from ipython->archivebox)
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
     ------------------------------------- 1.6/1.6 MB 5.0 MB/s eta 0:00:00
Collecting matplotlib-inline (from ipython->archivebox)
  Downloading matplotlib_inline-0.1.6-py3-none-any.whl (9.4 kB)
Collecting pickleshare (from ipython->archivebox)
  Downloading pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB)
Collecting prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 (from ipython->archivebox)

  Downloading prompt_toolkit-3.0.38-py3-none-any.whl (385 kB)
     --------------------------------- 385.8/385.8 kB 4.8 MB/s eta 0:00:00
Requirement already satisfied: pygments>=2.4.0 in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (2.14.0)
Collecting stack-data (from ipython->archivebox)
  Downloading stack_data-0.6.2-py3-none-any.whl (24 kB)
Collecting traitlets>=5 (from ipython->archivebox)
  Downloading traitlets-5.9.0-py3-none-any.whl (117 kB)
     --------------------------------- 117.4/117.4 kB 3.5 MB/s eta 0:00:00
Requirement already satisfied: colorama in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (0.4.4)
Requirement already satisfied: typing-extensions>=4 in c:\python\python-3.10.2\lib\site-packages (from asgiref<4,>=3.2.10->django<3.2,>=3.1.3->archivebox) (4.0.1)
Collecting parso<0.9.0,>=0.8.0 (from jedi>=0.16->ipython->archivebox)
  Downloading parso-0.8.3-py2.py3-none-any.whl (100 kB)
     --------------------------------- 100.8/100.8 kB 5.7 MB/s eta 0:00:00
Collecting wcwidth (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython->archivebox)
  Downloading wcwidth-0.2.6-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: six>=1.5 in c:\python\python-3.10.2\lib\site-packages (from python-dateutil->croniter>=0.3.34->archivebox) (1.16.0)
Collecting executing>=1.2.0 (from stack-data->ipython->archivebox)
  Downloading executing-1.2.0-py2.py3-none-any.whl (24 kB)
Collecting asttokens>=2.1.0 (from stack-data->ipython->archivebox)
  Downloading asttokens-2.2.1-py2.py3-none-any.whl (26 kB)
Collecting pure-eval (from stack-data->ipython->archivebox)
  Downloading pure_eval-0.2.2-py3-none-any.whl (11 kB)
Collecting tzdata (from tzlocal->dateparser->archivebox)
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     --------------------------------- 341.8/341.8 kB 5.3 MB/s eta 0:00:00
Installing collected packages: wcwidth, pure-eval, pickleshare, executing, backcall, w3lib, tzdata, traitlets, sqlparse, regex, prompt-toolkit, parso, asttokens, asgiref, tzlocal, stack-data, python-crontab, matplotlib-inline, jedi, django, croniter, ipython, django-extensions, dateparser, archivebox
Successfully installed archivebox-0.6.2 asgiref-3.7.2 asttokens-2.2.1 backcall-0.2.0 croniter-1.4.1 dateparser-1.1.8 django-3.1.14 django-extensions-3.1.5 executing-1.2.0 ipython-8.14.0 jedi-0.18.2 matplotlib-inline-0.1.6 parso-0.8.3 pickleshare-0.7.5 prompt-toolkit-3.0.38 pure-eval-0.2.2 python-crontab-2.7.1 regex-2023.6.3 sqlparse-0.4.4 stack-data-0.6.2 traitlets-5.9.0 tzdata-2023.3 tzlocal-5.0.1 w3lib-2.1.1 wcwidth-0.2.6

E:\Sync\Records
λ archivebox help
Welcome to ArchiveBox v0.6.2!

To import an existing archive (from a previous version of ArchiveBox):
    1. cd into your data dir OUTPUT_DIR (usually ArchiveBox/output) and run:

    2. archivebox init

To start a new archive:
    1. Create an empty directory, then cd into it and run:
    2. archivebox init

For more information, see the documentation here:
    https://github.com/ArchiveBox/ArchiveBox/wiki
    
E:\Sync\Records
λ mkdir archivebox

E:\Sync\Records
λ cd archivebox

E:\Sync\Records\archivebox
λ archivebox init
[i] [2023-06-24 08:16:41] ArchiveBox v0.6.2: archivebox init
    > E:\Sync\Records\archivebox

[+] Initializing a new ArchiveBox v0.6.2 collection...
----------------------------------------------------------------------

[+] Building archive folder structure...
    + ./archive, ./sources, ./logs...
    + ./ArchiveBox.conf...

[+] Building main SQL index and running initial migrations...
    Operations to perform:
      Apply all migrations: admin, auth, contenttypes, core, sessions
    Running migrations:
    Applying contenttypes.0001_initial... OK
    Applying auth.0001_initial... OK
    Applying admin.0001_initial... OK
    Applying admin.0002_logentry_remove_auto_add... OK
    Applying admin.0003_logentry_add_action_flag_choices... OK
    Applying contenttypes.0002_remove_content_type_name... OK
    Applying auth.0002_alter_permission_name_max_length... OK
    Applying auth.0003_alter_user_email_max_length... OK
    Applying auth.0004_alter_user_username_opts... OK
    Applying auth.0005_alter_user_last_login_null... OK
    Applying auth.0006_require_contenttypes_0002... OK
    Applying auth.0007_alter_validators_add_error_messages... OK
    Applying auth.0008_alter_user_username_max_length... OK
    Applying auth.0009_alter_user_last_name_max_length... OK
    Applying auth.0010_alter_group_name_max_length... OK
    Applying auth.0011_update_proxy_permissions... OK
    Applying auth.0012_alter_user_first_name_max_length... OK
    Applying core.0001_initial... OK
    Applying core.0002_auto_20200625_1521... OK
    Applying core.0003_auto_20200630_1034... OK
    Applying core.0004_auto_20200713_1552... OK
    Applying core.0005_auto_20200728_0326... OK
    Applying core.0006_auto_20201012_1520... OK
    Applying core.0007_archiveresult... OK
    Applying core.0008_auto_20210105_1421... OK
    Applying core.0009_auto_20210216_1038... OK
    Applying core.0010_auto_20210216_1055... OK
    Applying core.0011_auto_20210216_1331... OK
    Applying core.0012_auto_20210216_1425... OK
    Applying core.0013_auto_20210218_0729... OK
    Applying core.0014_auto_20210218_0729... OK
    Applying core.0015_auto_20210218_0730... OK
    Applying core.0016_auto_20210218_1204... OK
    Applying core.0017_auto_20210219_0211... OK
    Applying core.0018_auto_20210327_0952... OK
    Applying core.0019_auto_20210401_0654... OK
    Applying core.0020_auto_20210410_1031... OK
    Applying sessions.0001_initial... OK

    √ ./index.sqlite3

[*] Checking links from indexes and archive folders (safe to Ctrl+C)...

[*] [2023-06-24 08:16:47] Writing 0 links to main index...

    √ ./index.sqlite3

----------------------------------------------------------------------
[√] Done. A new ArchiveBox collection was initialized (0 links).

    Hint: To view your archive index, run:
        archivebox server  # then visit http://127.0.0.1:8000

    To add new links, you can run:
        archivebox add ~/some/path/or/url/to/list_of_links.txt

    For more usage and examples, run:
        archivebox help

E:\Sync\Records\archivebox
λ ls
archive/  ArchiveBox.conf  index.sqlite3  logs/  sources/

E:\Sync\Records\archivebox
λ archivebox add https://archivebox.io/
[i] [2023-06-24 08:17:04] ArchiveBox v0.6.2: archivebox add https://archivebox.io/
    > E:\Sync\Records\archivebox

[!] Warning: Missing 4 recommended dependencies
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False

    ! RIPGREP_BINARY: rg (unable to detect version)

[+] [2023-06-24 08:17:05] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/E:\Sync\Records\archivebox\sources\1687594625-import.txt


    > Parsed 1 URLs from input (Generic TXT)
    > Found 1 new URLs not already in index

[*] [2023-06-24 08:17:05] Writing 1 links to main index...

    √ ./index.sqlite3

[▶] [2023-06-24 08:17:05] Starting archiving of 1 snapshots in index...

[+] [2023-06-24 08:17:05] "archivebox.io"
    https://archivebox.io/
    > E:\Sync\Records\archivebox\archive\1687594625.565415
      > title

      > favicon

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=archivebox.io

      > headers

      > wget

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=E:\Sync\Records\archivebox\archive\1687594625.565415\warc\1687594626 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.1" --compression=auto https://archivebox.io/

      > readability

        Extractor failed:
            FileNotFoundError [WinError 2] The system cannot find the file specified
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            readability-extractor ./{singlefile,dom}.html

      > mercury

        Extractor failed:
            FileNotFoundError [WinError 2] The system cannot find the file specified
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            mercury-parser https://archivebox.io/ --format=text

      > media

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://archivebox.io/

      > archive_org

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://web.archive.org/save/https://archivebox.io/

        5 files (349.9 KB) in 0:00:02s

[√] [2023-06-24 08:17:08] Update of 1 pages complete (3.04 sec)
    - 0 links skipped
    - 1 links updated
    - 1 links had errors

    Hint: To manage your archive in a Web UI, run:
        archivebox server 0.0.0.0:8000

E:\Sync\Records\archivebox
λ archivebox server 0.0.0.0:8001
[i] [2023-06-24 08:17:26] ArchiveBox v0.6.2: archivebox server 0.0.0.0:8001
    > E:\Sync\Records\archivebox

[+] Starting ArchiveBox webserver...
    > Logging errors to ./logs/errors.log
[!] No admin users exist yet, you will not be able to edit links in the UI.

    To create an admin user, run:
        archivebox manage createsuperuser

Performing system checks...

System check identified no issues (0 silenced).
June 24, 2023 - 08:17:27
Django version 3.1.14, using settings 'core.settings'
Starting development server at http://0.0.0.0:8001/
Quit the server with CTRL-BREAK.
"GET / HTTP/1.1" 302 0
"GET /public HTTP/1.1" 301 0
"GET /public/ HTTP/1.1" 200 7789
Not Found: /archive/1687594625.565415/favicon.ico
Not Found: /archive/1687594625.565415/favicon.ico
"GET /archive/1687594625.565415/index.html HTTP/1.1" 200 241603
Not Found: /archive/1687594625.565415/screenshot.png
"GET /archive/1687594625.565415/screenshot.png HTTP/1.1" 404 1242
Not Found: /archive/1687594625.565415/favicon.ico
Not Found: /archive/1687594625.565415/singlefile.html
Not Found: /archive/1687594625.565415/output.pdf
"GET /archive/1687594625.565415/singlefile.html HTTP/1.1" 404 1243
Not Found: /archive/1687594625.565415/readability/content.html
Not Found: /archive/1687594625.565415/output.html
"GET /archive/1687594625.565415/output.pdf HTTP/1.1" 404 1238
"GET /archive/1687594625.565415/headers.json HTTP/1.1" 200 882
"GET /archive/1687594625.565415/readability/content.html HTTP/1.1" 404 1252 "GET /archive/1687594625.565415/output.html HTTP/1.1" 404 1239
Not Found: /archive/1687594625.565415/mercury/content.html
Not Found: /archive/1687594625.565415/git/
"GET /archive/1687594625.565415/mercury/content.html HTTP/1.1" 404 1248
Not Found: /archive/1687594625.565415/singlefile.html
"GET /archive/1687594625.565415/git/ HTTP/1.1" 404 1232
"GET /archive/1687594625.565415/archivebox.io%5Cindex.html HTTP/1.1" 200 80378
"GET /archive/1687594625.565415/singlefile.html HTTP/1.1" 404 1243
"GET /archive/1687594625.565415/media/ HTTP/1.1" 200 401
Not Found: /assets/css/style.css
Not Found: /assets/js/headsmart.min.js
Not Found: /assets/js/modernizr.js
"GET /assets/css/style.css?v=0d26538a4bea3671e3e2b7c202b1ebd72b564561 HTTP/1.1" 404 179
"GET /assets/js/headsmart.min.js HTTP/1.1" 404 179
"GET /assets/js/modernizr.js HTTP/1.1" 404 179
Not Found: /assets/css/non-screen.css
Not Found: /assets/css/mobile.css
"GET /assets/css/non-screen.css HTTP/1.1" 404 179
"GET /assets/css/mobile.css HTTP/1.1" 404 179
Not Found: /assets/js/headsmart.min.js
"GET /assets/js/headsmart.min.js HTTP/1.1" 404 179

E:\Sync\Records\archivebox
λ pip --version
pip 23.1.2 from C:\Users\Sophie\AppData\Roaming\Python\Python310\site-packages\pip (python 3.10)

E:\Sync\Records\archivebox
λ python --version
Python 3.10.2

E:\Sync\Records\archivebox
λ archivebox --version
ArchiveBox v0.6.2
Cpython Windows Windows-10-10.0.19044-SP0 AMD64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep


[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     C:\Python\python-3.10.2\Scripts\archivebox.exe
 √  PYTHON_BINARY         v3.10.2         valid     C:\Python\python-3.10.2\python.exe
 √  DJANGO_BINARY         v3.1.14         valid     C:\Python\python-3.10.2\Lib\site-packages\django\bin\django-admin.py
 √  CURL_BINARY           v7.55.1         valid     C:\Windows\system32\curl.EXE
 √  WGET_BINARY           v1.21.1         valid     C:\msys64\usr\bin\wget.EXE
 √  NODE_BINARY           v18.12.0        valid     C:\node\node-v18.12.0\node.EXE
 X  SINGLEFILE_BINARY     ?               invalid   single-file

 X  READABILITY_BINARY    ?               invalid   readability-extractor

 X  MERCURY_BINARY        ?               invalid   mercury-parser

 √  GIT_BINARY            v2.29.1.        valid     C:\apps\cmder.1.3.18\vendor\git-for-windows\cmd\git.EXE
 √  YOUTUBEDL_BINARY      v2021.12.17     valid     C:\Python\python-3.10.2\Scripts\youtube-dl.EXE
 -  CHROME_BINARY         -               disabled

 X  RIPGREP_BINARY        ?               invalid   rg


[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     C:\Python\python-3.10.2\Lib\site-packages\archivebox
 √  TEMPLATES_DIR         3 files         valid     C:\Python\python-3.10.2\Lib\site-packages\archivebox\templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled


[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled

 -  COOKIES_FILE          -               disabled


[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     E:\Sync\Records\archivebox
 √  SOURCES_DIR           1 files         valid     .\sources

 √  LOGS_DIR              1 files         valid     .\logs

 √  ARCHIVE_DIR           1 files         valid     .\archive

 √  CONFIG_FILE           84.0 Bytes      valid     .\ArchiveBox.conf

 √  SQL_INDEX             208.0 KB        valid     .\index.sqlite3


[!] Warning: Missing 4 recommended dependencies
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False

    ! RIPGREP_BINARY: rg (unable to detect version)

Update: I also tried running archivebox setup. This also produced an unusual error. Running archivebox update https://archivebox.io/ afterwards produced the same os.getpgid related errors.

E:\Sync\Records\archivebox
λ archivebox setup
[i] [2023-06-24 08:36:16] ArchiveBox v0.6.2: archivebox setup
    > E:\Sync\Records\archivebox


[+] Creating new admin user for the Web UI...
Username (leave blank to use '[name]'): [name]
Email address: [email]
Password:
Password (again):
Superuser created successfully.

[+] Installing enabled ArchiveBox dependencies automatically...

    Installing YOUTUBEDL_BINARY automatically using pip...
2021.12.17 is already installed youtube-dl

    Installing CHROME_BINARY automatically using playwright...

    Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm...
[X] Failed to install npm packages: [WinError 2] The system cannot find the file specified
    Hint: Try deleting E:\Sync\Records\archivebox/node_modules and running it again

E:\Sync\Records\archivebox  (archivebox@0.6.2)
λ ls
archive/  ArchiveBox.conf  index.sqlite3  logs/  package.json  sources/

E:\Sync\Records\archivebox  (archivebox@0.6.2)
λ archivebox update https://archivebox.io/
[i] [2023-06-24 08:41:09] ArchiveBox v0.6.2: archivebox update https://archivebox.io/
    > E:\Sync\Records\archivebox

[!] Warning: Missing 4 recommended dependencies
    ! SINGLEFILE_BINARY: single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False

    ! RIPGREP_BINARY: rg (unable to detect version)


[▶] [2023-06-24 08:41:10] Starting archiving of 1 snapshots in index...

[√] [2023-06-24 08:41:10] "ArchiveBox | � Open source self-hosted web archi ving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more…"
    https://archivebox.io/
    √ E:\Sync\Records\archivebox\archive\1687594625.565415
      > favicon

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=archivebox.io

      > readability

        Extractor failed:
            FileNotFoundError [WinError 2] The system cannot find the file specified
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            readability-extractor ./{singlefile,dom}.html

      > archive_org

        Extractor failed:
            AttributeError module 'os' has no attribute 'getpgid'
        Run to see full output:
            cd E:\Sync\Records\archivebox\archive\1687594625.565415;
            curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://web.archive.org/save/https://archivebox.io/

        5 files (352.8 KB) in 0:00:00s

[√] [2023-06-24 08:41:11] Update of 1 pages complete (0.38 sec)
    - 0 links skipped
    - 1 links updated
    - 1 links had errors

    Hint: To manage your archive in a Web UI, run:
        archivebox server 0.0.0.0:8000

E:\Sync\Records\archivebox  (archivebox@0.6.2)
λ npm --version
8.19.2

E:\Sync\Records\archivebox  (archivebox@0.6.2)
λ node --version
v18.12.0
Originally created by @pineapplemachine on GitHub (Jun 24, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1165 #### Describe the bug I newly installed archivebox on Windows 10 IoT Enterprise LTSC 21H2 using pip and python 3. See below for log output, including python, pip, and archivebox version information. After installing the pip package and running `archivebox init` in an empty directory, `archivebox add https://archivebox.io/` logged [platform-related errors](https://docs.python.org/3/library/os.html#os.getpgid), with some extractors attempting to call `os.getpgid` which is not available on Windows. I tried running `archivebox server 0.0.0.0:8001` and then attempted to view the archived page in Firefox. A link to the archived page was visible at `http://localhost:8001/public/`, but when I clicked the link I was shown this error: > Snapshot [[1687594625.565415]](http://localhost:8001/archive/1687594625.565415/index.html) exists in DB, but resource 1687594625.565415/singlefile.html does not exist in [snapshot dir](http://localhost:8001/archive/1687594625.565415/) yet. > Maybe this resource type is not availabe for this Snapshot, or the archiving process has not completed yet? > \# run this cmd to finish archiving this Snapshot > archivebox update -t timestamp 1687594625.565415 The archivebox homepage suggests that archivebox can be installed via pip on Windows: ![image](https://github.com/ArchiveBox/ArchiveBox/assets/7266412/824c6a16-1335-4688-8c0e-3b9ed027dd46) #### Steps to reproduce See above. #### ArchiveBox version See below. #### Screenshots or log output Here is a complete CLI log of installing the pip package, setting up archivebox, and attempting to add and access an archived page. See the end of the log for version information for python, pip, and archivebox. ``` E:\Sync\Records λ pip install archivebox Collecting archivebox Downloading archivebox-0.6.2-py3-none-any.whl (489 kB) --------------------------------- 489.2/489.2 kB 5.1 MB/s eta 0:00:00 Requirement already satisfied: requests>=2.24.0 in c:\python\python-3.10.2\lib\site-packages (from archivebox) (2.27.1) Requirement already satisfied: mypy-extensions>=0.4.3 in c:\python\python-3.10.2\lib\site-packages (from archivebox) (1.0.0) Collecting django<3.2,>=3.1.3 (from archivebox) Downloading Django-3.1.14-py3-none-any.whl (7.8 MB) ------------------------------------- 7.8/7.8 MB 4.9 MB/s eta 0:00:00 Collecting django-extensions>=3.0.3 (from archivebox) Downloading django_extensions-3.2.3-py3-none-any.whl (229 kB) --------------------------------- 229.9/229.9 kB 4.7 MB/s eta 0:00:00 Collecting dateparser (from archivebox) Downloading dateparser-1.1.8-py2.py3-none-any.whl (293 kB) --------------------------------- 293.8/293.8 kB 4.6 MB/s eta 0:00:00 Collecting ipython (from archivebox) Downloading ipython-8.14.0-py3-none-any.whl (798 kB) --------------------------------- 798.7/798.7 kB 5.1 MB/s eta 0:00:00 Requirement already satisfied: youtube-dl in c:\python\python-3.10.2\lib\site-packages (from archivebox) (2021.12.17) Collecting python-crontab>=2.5.1 (from archivebox) Downloading python_crontab-2.7.1-py3-none-any.whl (26 kB) Collecting croniter>=0.3.34 (from archivebox) Downloading croniter-1.4.1-py2.py3-none-any.whl (19 kB) Collecting w3lib>=1.22.0 (from archivebox) Downloading w3lib-2.1.1-py3-none-any.whl (21 kB) Requirement already satisfied: python-dateutil in c:\python\python-3.10.2\lib\site-packages (from croniter>=0.3.34->archivebox) (2.8.2) Collecting asgiref<4,>=3.2.10 (from django<3.2,>=3.1.3->archivebox) Downloading asgiref-3.7.2-py3-none-any.whl (24 kB) Requirement already satisfied: pytz in c:\python\python-3.10.2\lib\site-packages (from django<3.2,>=3.1.3->archivebox) (2021.3) Collecting sqlparse>=0.2.2 (from django<3.2,>=3.1.3->archivebox) Downloading sqlparse-0.4.4-py3-none-any.whl (41 kB) ---------------------------------------- 41.2/41.2 kB ? eta 0:00:00 INFO: pip is looking at multiple versions of django-extensions to determine which version is compatible with other requirements. This could take a while. Collecting django-extensions>=3.0.3 (from archivebox) Downloading django_extensions-3.2.1-py3-none-any.whl (229 kB) --------------------------------- 229.4/229.4 kB 3.5 MB/s eta 0:00:00 Downloading django_extensions-3.2.0-py3-none-any.whl (229 kB) --------------------------------- 229.1/229.1 kB 4.7 MB/s eta 0:00:00 Downloading django_extensions-3.1.5-py3-none-any.whl (224 kB) --------------------------------- 224.2/224.2 kB 4.6 MB/s eta 0:00:00 Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (1.26.8) Requirement already satisfied: certifi>=2017.4.17 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (2.0.10) Requirement already satisfied: idna<4,>=2.5 in c:\python\python-3.10.2\lib\site-packages (from requests>=2.24.0->archivebox) (3.3) Collecting regex!=2019.02.19,!=2021.8.27 (from dateparser->archivebox) Downloading regex-2023.6.3-cp310-cp310-win_amd64.whl (268 kB) --------------------------------- 268.0/268.0 kB 4.1 MB/s eta 0:00:00 Collecting tzlocal (from dateparser->archivebox) Downloading tzlocal-5.0.1-py3-none-any.whl (20 kB) Collecting backcall (from ipython->archivebox) Downloading backcall-0.2.0-py2.py3-none-any.whl (11 kB) Requirement already satisfied: decorator in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (4.4.2) Collecting jedi>=0.16 (from ipython->archivebox) Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB) ------------------------------------- 1.6/1.6 MB 5.0 MB/s eta 0:00:00 Collecting matplotlib-inline (from ipython->archivebox) Downloading matplotlib_inline-0.1.6-py3-none-any.whl (9.4 kB) Collecting pickleshare (from ipython->archivebox) Downloading pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB) Collecting prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 (from ipython->archivebox) Downloading prompt_toolkit-3.0.38-py3-none-any.whl (385 kB) --------------------------------- 385.8/385.8 kB 4.8 MB/s eta 0:00:00 Requirement already satisfied: pygments>=2.4.0 in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (2.14.0) Collecting stack-data (from ipython->archivebox) Downloading stack_data-0.6.2-py3-none-any.whl (24 kB) Collecting traitlets>=5 (from ipython->archivebox) Downloading traitlets-5.9.0-py3-none-any.whl (117 kB) --------------------------------- 117.4/117.4 kB 3.5 MB/s eta 0:00:00 Requirement already satisfied: colorama in c:\python\python-3.10.2\lib\site-packages (from ipython->archivebox) (0.4.4) Requirement already satisfied: typing-extensions>=4 in c:\python\python-3.10.2\lib\site-packages (from asgiref<4,>=3.2.10->django<3.2,>=3.1.3->archivebox) (4.0.1) Collecting parso<0.9.0,>=0.8.0 (from jedi>=0.16->ipython->archivebox) Downloading parso-0.8.3-py2.py3-none-any.whl (100 kB) --------------------------------- 100.8/100.8 kB 5.7 MB/s eta 0:00:00 Collecting wcwidth (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython->archivebox) Downloading wcwidth-0.2.6-py2.py3-none-any.whl (29 kB) Requirement already satisfied: six>=1.5 in c:\python\python-3.10.2\lib\site-packages (from python-dateutil->croniter>=0.3.34->archivebox) (1.16.0) Collecting executing>=1.2.0 (from stack-data->ipython->archivebox) Downloading executing-1.2.0-py2.py3-none-any.whl (24 kB) Collecting asttokens>=2.1.0 (from stack-data->ipython->archivebox) Downloading asttokens-2.2.1-py2.py3-none-any.whl (26 kB) Collecting pure-eval (from stack-data->ipython->archivebox) Downloading pure_eval-0.2.2-py3-none-any.whl (11 kB) Collecting tzdata (from tzlocal->dateparser->archivebox) Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB) --------------------------------- 341.8/341.8 kB 5.3 MB/s eta 0:00:00 Installing collected packages: wcwidth, pure-eval, pickleshare, executing, backcall, w3lib, tzdata, traitlets, sqlparse, regex, prompt-toolkit, parso, asttokens, asgiref, tzlocal, stack-data, python-crontab, matplotlib-inline, jedi, django, croniter, ipython, django-extensions, dateparser, archivebox Successfully installed archivebox-0.6.2 asgiref-3.7.2 asttokens-2.2.1 backcall-0.2.0 croniter-1.4.1 dateparser-1.1.8 django-3.1.14 django-extensions-3.1.5 executing-1.2.0 ipython-8.14.0 jedi-0.18.2 matplotlib-inline-0.1.6 parso-0.8.3 pickleshare-0.7.5 prompt-toolkit-3.0.38 pure-eval-0.2.2 python-crontab-2.7.1 regex-2023.6.3 sqlparse-0.4.4 stack-data-0.6.2 traitlets-5.9.0 tzdata-2023.3 tzlocal-5.0.1 w3lib-2.1.1 wcwidth-0.2.6 E:\Sync\Records λ archivebox help Welcome to ArchiveBox v0.6.2! To import an existing archive (from a previous version of ArchiveBox): 1. cd into your data dir OUTPUT_DIR (usually ArchiveBox/output) and run: 2. archivebox init To start a new archive: 1. Create an empty directory, then cd into it and run: 2. archivebox init For more information, see the documentation here: https://github.com/ArchiveBox/ArchiveBox/wiki E:\Sync\Records λ mkdir archivebox E:\Sync\Records λ cd archivebox E:\Sync\Records\archivebox λ archivebox init [i] [2023-06-24 08:16:41] ArchiveBox v0.6.2: archivebox init > E:\Sync\Records\archivebox [+] Initializing a new ArchiveBox v0.6.2 collection... ---------------------------------------------------------------------- [+] Building archive folder structure... + ./archive, ./sources, ./logs... + ./ArchiveBox.conf... [+] Building main SQL index and running initial migrations... Operations to perform: Apply all migrations: admin, auth, contenttypes, core, sessions Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying admin.0003_logentry_add_action_flag_choices... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying auth.0009_alter_user_last_name_max_length... OK Applying auth.0010_alter_group_name_max_length... OK Applying auth.0011_update_proxy_permissions... OK Applying auth.0012_alter_user_first_name_max_length... OK Applying core.0001_initial... OK Applying core.0002_auto_20200625_1521... OK Applying core.0003_auto_20200630_1034... OK Applying core.0004_auto_20200713_1552... OK Applying core.0005_auto_20200728_0326... OK Applying core.0006_auto_20201012_1520... OK Applying core.0007_archiveresult... OK Applying core.0008_auto_20210105_1421... OK Applying core.0009_auto_20210216_1038... OK Applying core.0010_auto_20210216_1055... OK Applying core.0011_auto_20210216_1331... OK Applying core.0012_auto_20210216_1425... OK Applying core.0013_auto_20210218_0729... OK Applying core.0014_auto_20210218_0729... OK Applying core.0015_auto_20210218_0730... OK Applying core.0016_auto_20210218_1204... OK Applying core.0017_auto_20210219_0211... OK Applying core.0018_auto_20210327_0952... OK Applying core.0019_auto_20210401_0654... OK Applying core.0020_auto_20210410_1031... OK Applying sessions.0001_initial... OK √ ./index.sqlite3 [*] Checking links from indexes and archive folders (safe to Ctrl+C)... [*] [2023-06-24 08:16:47] Writing 0 links to main index... √ ./index.sqlite3 ---------------------------------------------------------------------- [√] Done. A new ArchiveBox collection was initialized (0 links). Hint: To view your archive index, run: archivebox server # then visit http://127.0.0.1:8000 To add new links, you can run: archivebox add ~/some/path/or/url/to/list_of_links.txt For more usage and examples, run: archivebox help E:\Sync\Records\archivebox λ ls archive/ ArchiveBox.conf index.sqlite3 logs/ sources/ E:\Sync\Records\archivebox λ archivebox add https://archivebox.io/ [i] [2023-06-24 08:17:04] ArchiveBox v0.6.2: archivebox add https://archivebox.io/ > E:\Sync\Records\archivebox [!] Warning: Missing 4 recommended dependencies ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False ! RIPGREP_BINARY: rg (unable to detect version) [+] [2023-06-24 08:17:05] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/E:\Sync\Records\archivebox\sources\1687594625-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-06-24 08:17:05] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-06-24 08:17:05] Starting archiving of 1 snapshots in index... [+] [2023-06-24 08:17:05] "archivebox.io" https://archivebox.io/ > E:\Sync\Records\archivebox\archive\1687594625.565415 > title > favicon Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=archivebox.io > headers > wget Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=E:\Sync\Records\archivebox\archive\1687594625.565415\warc\1687594626 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.1" --compression=auto https://archivebox.io/ > readability Extractor failed: FileNotFoundError [WinError 2] The system cannot find the file specified Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; readability-extractor ./{singlefile,dom}.html > mercury Extractor failed: FileNotFoundError [WinError 2] The system cannot find the file specified Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; mercury-parser https://archivebox.io/ --format=text > media Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; youtube-dl --write-description --write-info-json --write-annotations --write-thumbnail --no-call-home --write-sub --all-subs --write-auto-sub --convert-subs=srt --yes-playlist --continue --ignore-errors --geo-bypass --add-metadata --max-filesize=750m https://archivebox.io/ > archive_org Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://web.archive.org/save/https://archivebox.io/ 5 files (349.9 KB) in 0:00:02s [√] [2023-06-24 08:17:08] Update of 1 pages complete (3.04 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 E:\Sync\Records\archivebox λ archivebox server 0.0.0.0:8001 [i] [2023-06-24 08:17:26] ArchiveBox v0.6.2: archivebox server 0.0.0.0:8001 > E:\Sync\Records\archivebox [+] Starting ArchiveBox webserver... > Logging errors to ./logs/errors.log [!] No admin users exist yet, you will not be able to edit links in the UI. To create an admin user, run: archivebox manage createsuperuser Performing system checks... System check identified no issues (0 silenced). June 24, 2023 - 08:17:27 Django version 3.1.14, using settings 'core.settings' Starting development server at http://0.0.0.0:8001/ Quit the server with CTRL-BREAK. "GET / HTTP/1.1" 302 0 "GET /public HTTP/1.1" 301 0 "GET /public/ HTTP/1.1" 200 7789 Not Found: /archive/1687594625.565415/favicon.ico Not Found: /archive/1687594625.565415/favicon.ico "GET /archive/1687594625.565415/index.html HTTP/1.1" 200 241603 Not Found: /archive/1687594625.565415/screenshot.png "GET /archive/1687594625.565415/screenshot.png HTTP/1.1" 404 1242 Not Found: /archive/1687594625.565415/favicon.ico Not Found: /archive/1687594625.565415/singlefile.html Not Found: /archive/1687594625.565415/output.pdf "GET /archive/1687594625.565415/singlefile.html HTTP/1.1" 404 1243 Not Found: /archive/1687594625.565415/readability/content.html Not Found: /archive/1687594625.565415/output.html "GET /archive/1687594625.565415/output.pdf HTTP/1.1" 404 1238 "GET /archive/1687594625.565415/headers.json HTTP/1.1" 200 882 "GET /archive/1687594625.565415/readability/content.html HTTP/1.1" 404 1252 "GET /archive/1687594625.565415/output.html HTTP/1.1" 404 1239 Not Found: /archive/1687594625.565415/mercury/content.html Not Found: /archive/1687594625.565415/git/ "GET /archive/1687594625.565415/mercury/content.html HTTP/1.1" 404 1248 Not Found: /archive/1687594625.565415/singlefile.html "GET /archive/1687594625.565415/git/ HTTP/1.1" 404 1232 "GET /archive/1687594625.565415/archivebox.io%5Cindex.html HTTP/1.1" 200 80378 "GET /archive/1687594625.565415/singlefile.html HTTP/1.1" 404 1243 "GET /archive/1687594625.565415/media/ HTTP/1.1" 200 401 Not Found: /assets/css/style.css Not Found: /assets/js/headsmart.min.js Not Found: /assets/js/modernizr.js "GET /assets/css/style.css?v=0d26538a4bea3671e3e2b7c202b1ebd72b564561 HTTP/1.1" 404 179 "GET /assets/js/headsmart.min.js HTTP/1.1" 404 179 "GET /assets/js/modernizr.js HTTP/1.1" 404 179 Not Found: /assets/css/non-screen.css Not Found: /assets/css/mobile.css "GET /assets/css/non-screen.css HTTP/1.1" 404 179 "GET /assets/css/mobile.css HTTP/1.1" 404 179 Not Found: /assets/js/headsmart.min.js "GET /assets/js/headsmart.min.js HTTP/1.1" 404 179 E:\Sync\Records\archivebox λ pip --version pip 23.1.2 from C:\Users\Sophie\AppData\Roaming\Python\Python310\site-packages\pip (python 3.10) E:\Sync\Records\archivebox λ python --version Python 3.10.2 E:\Sync\Records\archivebox λ archivebox --version ArchiveBox v0.6.2 Cpython Windows Windows-10-10.0.19044-SP0 AMD64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid C:\Python\python-3.10.2\Scripts\archivebox.exe √ PYTHON_BINARY v3.10.2 valid C:\Python\python-3.10.2\python.exe √ DJANGO_BINARY v3.1.14 valid C:\Python\python-3.10.2\Lib\site-packages\django\bin\django-admin.py √ CURL_BINARY v7.55.1 valid C:\Windows\system32\curl.EXE √ WGET_BINARY v1.21.1 valid C:\msys64\usr\bin\wget.EXE √ NODE_BINARY v18.12.0 valid C:\node\node-v18.12.0\node.EXE X SINGLEFILE_BINARY ? invalid single-file X READABILITY_BINARY ? invalid readability-extractor X MERCURY_BINARY ? invalid mercury-parser √ GIT_BINARY v2.29.1. valid C:\apps\cmder.1.3.18\vendor\git-for-windows\cmd\git.EXE √ YOUTUBEDL_BINARY v2021.12.17 valid C:\Python\python-3.10.2\Scripts\youtube-dl.EXE - CHROME_BINARY - disabled X RIPGREP_BINARY ? invalid rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid C:\Python\python-3.10.2\Lib\site-packages\archivebox √ TEMPLATES_DIR 3 files valid C:\Python\python-3.10.2\Lib\site-packages\archivebox\templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid E:\Sync\Records\archivebox √ SOURCES_DIR 1 files valid .\sources √ LOGS_DIR 1 files valid .\logs √ ARCHIVE_DIR 1 files valid .\archive √ CONFIG_FILE 84.0 Bytes valid .\ArchiveBox.conf √ SQL_INDEX 208.0 KB valid .\index.sqlite3 [!] Warning: Missing 4 recommended dependencies ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False ! RIPGREP_BINARY: rg (unable to detect version) ``` Update: I also tried running `archivebox setup`. This also produced an unusual error. Running `archivebox update https://archivebox.io/` afterwards produced the same `os.getpgid` related errors. ``` E:\Sync\Records\archivebox λ archivebox setup [i] [2023-06-24 08:36:16] ArchiveBox v0.6.2: archivebox setup > E:\Sync\Records\archivebox [+] Creating new admin user for the Web UI... Username (leave blank to use '[name]'): [name] Email address: [email] Password: Password (again): Superuser created successfully. [+] Installing enabled ArchiveBox dependencies automatically... Installing YOUTUBEDL_BINARY automatically using pip... 2021.12.17 is already installed youtube-dl Installing CHROME_BINARY automatically using playwright... Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm... [X] Failed to install npm packages: [WinError 2] The system cannot find the file specified Hint: Try deleting E:\Sync\Records\archivebox/node_modules and running it again E:\Sync\Records\archivebox (archivebox@0.6.2) λ ls archive/ ArchiveBox.conf index.sqlite3 logs/ package.json sources/ E:\Sync\Records\archivebox (archivebox@0.6.2) λ archivebox update https://archivebox.io/ [i] [2023-06-24 08:41:09] ArchiveBox v0.6.2: archivebox update https://archivebox.io/ > E:\Sync\Records\archivebox [!] Warning: Missing 4 recommended dependencies ! SINGLEFILE_BINARY: single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False ! RIPGREP_BINARY: rg (unable to detect version) [▶] [2023-06-24 08:41:10] Starting archiving of 1 snapshots in index... [√] [2023-06-24 08:41:10] "ArchiveBox | � Open source self-hosted web archi ving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more…" https://archivebox.io/ √ E:\Sync\Records\archivebox\archive\1687594625.565415 > favicon Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; curl --silent --location --compressed --max-time 60 --output favicon.ico --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://www.google.com/s2/favicons?domain=archivebox.io > readability Extractor failed: FileNotFoundError [WinError 2] The system cannot find the file specified Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; readability-extractor ./{singlefile,dom}.html > archive_org Extractor failed: AttributeError module 'os' has no attribute 'getpgid' Run to see full output: cd E:\Sync\Records\archivebox\archive\1687594625.565415; curl --silent --location --compressed --head --max-time 60 --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.61 Safari/537.36 ArchiveBox/0.6.2 (+https://github.com/ArchiveBox/ArchiveBox/) curl/curl 7.55.1 (Windows)" https://web.archive.org/save/https://archivebox.io/ 5 files (352.8 KB) in 0:00:00s [√] [2023-06-24 08:41:11] Update of 1 pages complete (0.38 sec) - 0 links skipped - 1 links updated - 1 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 E:\Sync\Records\archivebox (archivebox@0.6.2) λ npm --version 8.19.2 E:\Sync\Records\archivebox (archivebox@0.6.2) λ node --version v18.12.0 ```
kerem closed this issue 2026-03-01 14:45:50 +03:00
Author
Owner

@pirate commented on GitHub (Jun 28, 2023):

Sorry pip on windows is not "fully" supported, some people have gotten it working which is why it's listed on the site, but I really only recommend/support Docker on Windows, especially as of the most recent versions it's gotten harder for me to support native Windows.

Check out WebRecorder.net's options or Polarized or some of the other software on our community Wiki list of alternatives if you need an easy Windows solution: https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#Web-Archiving-Projects

<!-- gh-comment-id:1610996151 --> @pirate commented on GitHub (Jun 28, 2023): Sorry pip on windows is not "fully" supported, some people have gotten it working which is why it's listed on the site, but I really only recommend/support Docker on Windows, especially as of the most recent versions it's gotten harder for me to support native Windows. Check out WebRecorder.net's options or Polarized or some of the other software on our community Wiki list of alternatives if you need an easy Windows solution: https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-Community#Web-Archiving-Projects
Author
Owner

@pineapplemachine commented on GitHub (Jun 28, 2023):

Can I please suggest tweaking the homepage a bit to make it clearer that Windows is not officially supported?

Ctrl+F on the homepage first finds:

You can set it up as a command-line tool, web app, and desktop app (alpha), on Linux, macOS, and Windows.

Then Windows mentioned in various setup options, including pip.

It wasn't until I was already well down the rabbit hole of trying to figure out why it wasn't working out-of-the-box on Windows before I finally spotted the following mention of Windows, much further down the page:

Installing directly on Windows without Docker or WSL/WSL2/Cygwin is not officially supported (I cannot respond to Windows support tickets), but some advanced users have reported getting it working.

It might save people thinking of setting up on Windows a little time and headache if this was more clear. As for me, I had seen that and just thought I'd be able to set archivebox up quickly and experiment and see how it worked locally on my Windows desktop before setting it up on my Ubuntu server... It ended up just taking a lot more time and I didn't even quite get there in the end. (I almost did, with WSL, but then ran into an issue where archivebox did not like my chromium install and wanted a snap installation instead.)

<!-- gh-comment-id:1611400731 --> @pineapplemachine commented on GitHub (Jun 28, 2023): Can I please suggest tweaking the homepage a bit to make it clearer that Windows is not officially supported? Ctrl+F on the homepage first finds: > You can set it up as a [command-line tool](https://archivebox.io/#quickstart), [web app](https://archivebox.io/#quickstart), and [desktop app](https://github.com/ArchiveBox/electron-archivebox) (alpha), on Linux, macOS, and Windows. Then Windows mentioned in various setup options, including pip. It wasn't until I was already well down the rabbit hole of trying to figure out why it wasn't working out-of-the-box on Windows before I finally spotted the following mention of Windows, much further down the page: > Installing directly on Windows without Docker or WSL/WSL2/Cygwin is not officially supported (I cannot respond to Windows support tickets), but some advanced users have reported getting it working. It might save people thinking of setting up on Windows a little time and headache if this was more clear. As for me, I had seen that and just thought I'd be able to set archivebox up quickly and experiment and see how it worked locally on my Windows desktop before setting it up on my Ubuntu server... It ended up just taking a lot more time and I didn't even quite get there in the end. (I almost did, with WSL, but then ran into an issue where archivebox did not like my chromium install and wanted a snap installation instead.)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#723
No description provided.