[GH-ISSUE #805] Pip dist: archivebox setup failed due to "[WinError 2] The system cannot find the file specified" on Win10 #3526

Closed
opened 2026-03-14 23:21:59 +03:00 by kerem · 10 comments
Owner

Originally created by @fireattack on GitHub (Jul 22, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/805

E:\archivebox>archivebox setup
[i] [2021-07-22 01:40:22] ArchiveBox v0.6.2: archivebox setup
    > E:\archivebox


[+] Installing enabled ArchiveBox dependencies automatically...

    Installing YOUTUBEDL_BINARY automatically using pip...
2021.02.10 is already installed youtube-dl

    Installing CHROME_BINARY automatically using playwright...

    Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm...
[X] Failed to install npm packages: [WinError 2] The system cannot find the file specified
    Hint: Try deleting E:\archivebox/node_modules and running it again

(It says " Hint: Try deleting E:\archivebox/node_modules and running it again" but there is not node_modules subdir.)
So does archivebox init --setup.

I should have all the environments:

E:\archivebox>npm --version
7.20.0

E:\archivebox>node --version
v15.4.0

E:\archivebox>python --version
Python 3.9.1
Originally created by @fireattack on GitHub (Jul 22, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/805 ``` E:\archivebox>archivebox setup [i] [2021-07-22 01:40:22] ArchiveBox v0.6.2: archivebox setup > E:\archivebox [+] Installing enabled ArchiveBox dependencies automatically... Installing YOUTUBEDL_BINARY automatically using pip... 2021.02.10 is already installed youtube-dl Installing CHROME_BINARY automatically using playwright... Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm... [X] Failed to install npm packages: [WinError 2] The system cannot find the file specified Hint: Try deleting E:\archivebox/node_modules and running it again ``` (It says " Hint: Try deleting E:\archivebox/node_modules and running it again" but there is not `node_modules` subdir.) So does `archivebox init --setup`. I should have all the environments: ``` E:\archivebox>npm --version 7.20.0 E:\archivebox>node --version v15.4.0 E:\archivebox>python --version Python 3.9.1 ```
kerem 2026-03-14 23:21:59 +03:00
Author
Owner

@pirate commented on GitHub (Jul 24, 2021):

Have you tried running AB anyway (skipping finishing setup)? It may work partially, just without a few extractors.

Running directly on windows without WSL2/Docker/Cygwin is not officially supported. Unfortunately I can't help much as I don't have a windows system to test on easily.

<!-- gh-comment-id:886028946 --> @pirate commented on GitHub (Jul 24, 2021): Have you tried running AB anyway (skipping finishing setup)? It may work partially, just without a few extractors. Running directly on windows without WSL2/Docker/Cygwin is not officially supported. Unfortunately I can't help much as I don't have a windows system to test on easily.
Author
Owner

@barnett2010 commented on GitHub (Jul 29, 2021):

win10 cmd

pip install archivebox .........................finish.
archivebox setup

C:\webarchive1>archivebox setup
[i] [2021-07-29 05:44:55] ArchiveBox v0.6.2: archivebox setup
    > C:\webarchive1


[+] Installing enabled ArchiveBox dependencies automatically...

    Installing YOUTUBEDL_BINARY automatically using pip...
2021.06.06 is already installed youtube-dl

    Installing CHROME_BINARY automatically using playwright...

    Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm...
[X] Failed to install npm packages: [WinError 2] The system can not find the file specified。
    Hint: Try deleting C:\webarchive1/node_modules and running it again


C:\webarchive1>node -v
v14.17.3

C:\webarchive1>npm -v
7.20.2

C:\webarchive1>where npm
C:\Program Files\nodejs\npm
C:\Program Files\nodejs\npm.cmd
C:\Program Files\nodejs\node_global\npm
C:\Program Files\nodejs\node_global\npm.cmd

C:\webarchive1>where node
C:\Program Files\nodejs\node.exe
<!-- gh-comment-id:888823611 --> @barnett2010 commented on GitHub (Jul 29, 2021): win10 cmd ```logs pip install archivebox .........................finish. archivebox setup C:\webarchive1>archivebox setup [i] [2021-07-29 05:44:55] ArchiveBox v0.6.2: archivebox setup  > C:\webarchive1  [+] Installing enabled ArchiveBox dependencies automatically... Installing YOUTUBEDL_BINARY automatically using pip... 2021.06.06 is already installed youtube-dl Installing CHROME_BINARY automatically using playwright... Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm... [X] Failed to install npm packages: [WinError 2] The system can not find the file specified。 Hint: Try deleting C:\webarchive1/node_modules and running it again C:\webarchive1>node -v v14.17.3 C:\webarchive1>npm -v 7.20.2 C:\webarchive1>where npm C:\Program Files\nodejs\npm C:\Program Files\nodejs\npm.cmd C:\Program Files\nodejs\node_global\npm C:\Program Files\nodejs\node_global\npm.cmd C:\webarchive1>where node C:\Program Files\nodejs\node.exe ```
Author
Owner

@barnett2010 commented on GitHub (Jul 29, 2021):

If don’t install the npm module,
pdf screenshot singlefile .all file does not exist

<!-- gh-comment-id:888824843 --> @barnett2010 commented on GitHub (Jul 29, 2021): If don’t install the npm module, pdf screenshot singlefile .all file does not exist
Author
Owner

@Explorare commented on GitHub (Jul 30, 2021):

Met the same issue today.

[env]
Windows 10 19043.1110
Python 3.9.6
Node 16.6.0
npm 7.19.1

I've tried manually install the mercury / singlefile / readability using npm, but the program still didn't recognize them.

<!-- gh-comment-id:889898776 --> @Explorare commented on GitHub (Jul 30, 2021): Met the same issue today. [env] Windows 10 19043.1110 Python 3.9.6 Node 16.6.0 npm 7.19.1 I've tried manually install the mercury / singlefile / readability using npm, but the program still didn't recognize them.
Author
Owner

@pirate commented on GitHub (Jul 31, 2021):

Please post the full unredacted output of archivebox --version.

<!-- gh-comment-id:890289715 --> @pirate commented on GitHub (Jul 31, 2021): Please post the full unredacted output of `archivebox --version`.
Author
Owner

@Explorare commented on GitHub (Jul 31, 2021):

$ archivebox --version
ArchiveBox v0.6.2
Cpython Windows Windows-10-10.0.19043-SP0 AMD64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe
 √  PYTHON_BINARY         v3.9.6          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe    
 √  DJANGO_BINARY         v3.1.13         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py
 √  CURL_BINARY           v7.55.1         valid     C:\Windows\System32\curl.EXE                                        
 X  WGET_BINARY           ?               invalid   wget                                                                
 √  NODE_BINARY           v16.6.0         valid     "C:\Program Files\nodejs\node.EXE"                                  
 X  SINGLEFILE_BINARY     ?               invalid   .\node_modules\.bin\single-file                                     
 X  READABILITY_BINARY    ?               invalid   .\node_modules\.bin\readability-extractor                           
 √  MERCURY_BINARY        v1.0.0          valid     .\node_modules\.bin\mercury-parser

 √  GIT_BINARY            v2.31.1.        valid     "C:\Program Files\Git\cmd\git.EXE"

 √  YOUTUBEDL_BINARY      v2021.06.06     valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE
 -  CHROME_BINARY         -               disabled

 X  RIPGREP_BINARY        ?               invalid   rg


[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox
 √  TEMPLATES_DIR         3 files         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled


[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled

 -  COOKIES_FILE          -               disabled


[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     C:\Users\explo\Documents\Tools\ArchiveBox

 √  SOURCES_DIR           3 files         valid     .\sources

 √  LOGS_DIR              1 files         valid     .\logs

 √  ARCHIVE_DIR           2 files         valid     .\archive

 √  CONFIG_FILE           84.0 Bytes      valid     .\ArchiveBox.conf

 √  SQL_INDEX             212.0 KB        valid     .\index.sqlite3


[!] Warning: Missing 4 recommended dependencies
    ! WGET_BINARY: wget (unable to detect version)
    ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! RIPGREP_BINARY: rg (unable to detect version)
$ archivebox setup
[i] [2021-07-31 07:27:14] ArchiveBox v0.6.2: archivebox setup
    > C:\Users\explo\Documents\Tools\ArchiveBox


[+] Installing enabled ArchiveBox dependencies automatically...

    Installing YOUTUBEDL_BINARY automatically using pip...
2021.06.06 is already installed youtube-dl

    Installing CHROME_BINARY automatically using playwright...

    Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm...
[X] Failed to install npm packages: [WinError 2] The system cannot find the file specified
    Hint: Try deleting C:\Users\explo\Documents\Tools\ArchiveBox/node_modules and running it again
<!-- gh-comment-id:890305625 --> @Explorare commented on GitHub (Jul 31, 2021): ``` $ archivebox --version ArchiveBox v0.6.2 Cpython Windows Windows-10-10.0.19043-SP0 AMD64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe √ PYTHON_BINARY v3.9.6 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe √ DJANGO_BINARY v3.1.13 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py √ CURL_BINARY v7.55.1 valid C:\Windows\System32\curl.EXE X WGET_BINARY ? invalid wget √ NODE_BINARY v16.6.0 valid "C:\Program Files\nodejs\node.EXE" X SINGLEFILE_BINARY ? invalid .\node_modules\.bin\single-file X READABILITY_BINARY ? invalid .\node_modules\.bin\readability-extractor √ MERCURY_BINARY v1.0.0 valid .\node_modules\.bin\mercury-parser √ GIT_BINARY v2.31.1. valid "C:\Program Files\Git\cmd\git.EXE" √ YOUTUBEDL_BINARY v2021.06.06 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE - CHROME_BINARY - disabled X RIPGREP_BINARY ? invalid rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox √ TEMPLATES_DIR 3 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 8 files valid C:\Users\explo\Documents\Tools\ArchiveBox √ SOURCES_DIR 3 files valid .\sources √ LOGS_DIR 1 files valid .\logs √ ARCHIVE_DIR 2 files valid .\archive √ CONFIG_FILE 84.0 Bytes valid .\ArchiveBox.conf √ SQL_INDEX 212.0 KB valid .\index.sqlite3 [!] Warning: Missing 4 recommended dependencies ! WGET_BINARY: wget (unable to detect version) ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! RIPGREP_BINARY: rg (unable to detect version) ``` ``` $ archivebox setup [i] [2021-07-31 07:27:14] ArchiveBox v0.6.2: archivebox setup > C:\Users\explo\Documents\Tools\ArchiveBox [+] Installing enabled ArchiveBox dependencies automatically... Installing YOUTUBEDL_BINARY automatically using pip... 2021.06.06 is already installed youtube-dl Installing CHROME_BINARY automatically using playwright... Installing SINGLEFILE_BINARY, READABILITY_BINARY, MERCURY_BINARY automatically using npm... [X] Failed to install npm packages: [WinError 2] The system cannot find the file specified Hint: Try deleting C:\Users\explo\Documents\Tools\ArchiveBox/node_modules and running it again ```
Author
Owner

@Explorare commented on GitHub (Jul 31, 2021):

Tried to install the dependencies manually but the check result stills blaming not found them. The SingleFile dependency once validated but became invalid in the next check. Don't know why.

$ npm install -g "gildas-lormeau/SingleFile#master"

added 146 packages, and audited 147 packages in 1m

9 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities
$ archivebox --version
ArchiveBox v0.6.2
Cpython Windows Windows-10-10.0.19043-SP0 AMD64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe
 √  PYTHON_BINARY         v3.9.6          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe

 √  DJANGO_BINARY         v3.1.13         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py
 √  CURL_BINARY           v7.55.1         valid     C:\Windows\System32\curl.EXE

 X  WGET_BINARY           ?               invalid   wget

 √  NODE_BINARY           v16.6.0         valid     "C:\Program Files\nodejs\node.EXE"

 √  SINGLEFILE_BINARY     v0.3.26         valid     C:\Users\explo\AppData\Roaming\npm\single-file.cmd

 X  READABILITY_BINARY    ?               invalid   readability-extractor

 X  MERCURY_BINARY        ?               invalid   mercury-parser

 √  GIT_BINARY            v2.31.1.        valid     "C:\Program Files\Git\cmd\git.EXE"

 √  YOUTUBEDL_BINARY      v2021.06.06     valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE
 -  CHROME_BINARY         -               disabled

 X  RIPGREP_BINARY        ?               invalid   rg


[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox
 √  TEMPLATES_DIR         3 files         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled


[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled

 -  COOKIES_FILE          -               disabled


[i] Data locations:
 √  OUTPUT_DIR            7 files         valid     C:\Users\explo\Documents\Tools\ArchiveBox

 √  SOURCES_DIR           3 files         valid     .\sources

 √  LOGS_DIR              1 files         valid     .\logs

 √  ARCHIVE_DIR           2 files         valid     .\archive

 √  CONFIG_FILE           84.0 Bytes      valid     .\ArchiveBox.conf

 √  SQL_INDEX             212.0 KB        valid     .\index.sqlite3


[!] Warning: Missing 4 recommended dependencies
    ! WGET_BINARY: wget (unable to detect version)
    ! READABILITY_BINARY: readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! MERCURY_BINARY: mercury-parser (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False

    ! RIPGREP_BINARY: rg (unable to detect version)

$ npm install -g 'git+https://github.com/pirate/readability-extractor'

added 64 packages, and audited 65 packages in 11s

found 0 vulnerabilities
 explo@THINKPAD  ~\..\..\ArchiveBox  npm install @postlight/mercury-parser
npm WARN deprecated request-promise-native@1.0.9: request-promise-native has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142
npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated request-promise@4.2.6: request-promise has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142
npm WARN deprecated left-pad@1.3.0: use String.prototype.padStart()
npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead.
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
npm WARN deprecated core-js@2.6.12: core-js@<3.3 is no longer maintained and not recommended for usage due to the number of issues. Because of the V8 engine whims, feature detection in old core-js versions could cause a slowdown up to 100x even if nothing is polyfilled. Please, upgrade your dependencies to the actual version of core-js.

added 272 packages, and audited 279 packages in 8s

11 packages are looking for funding
  run `npm fund` for details

2 moderate severity vulnerabilities

Some issues need review, and may require choosing
a different dependency.

Run `npm audit` for details.


$ archivebox --version
ArchiveBox v0.6.2
Cpython Windows Windows-10-10.0.19043-SP0 AMD64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe
 √  PYTHON_BINARY         v3.9.6          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe

 √  DJANGO_BINARY         v3.1.13         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py
 √  CURL_BINARY           v7.55.1         valid     C:\Windows\System32\curl.EXE

 X  WGET_BINARY           ?               invalid   wget

 √  NODE_BINARY           v16.6.0         valid     "C:\Program Files\nodejs\node.EXE"

 X  SINGLEFILE_BINARY     ?               invalid   .\node_modules\.bin\single-file

 X  READABILITY_BINARY    ?               invalid   .\node_modules\.bin\readability-extractor

 √  MERCURY_BINARY        v1.0.0          valid     .\node_modules\.bin\mercury-parser

 √  GIT_BINARY            v2.31.1.        valid     "C:\Program Files\Git\cmd\git.EXE"

 √  YOUTUBEDL_BINARY      v2021.06.06     valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE
 -  CHROME_BINARY         -               disabled

 X  RIPGREP_BINARY        ?               invalid   rg


[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox
 √  TEMPLATES_DIR         3 files         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled


[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled

 -  COOKIES_FILE          -               disabled


[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     C:\Users\explo\Documents\Tools\ArchiveBox

 √  SOURCES_DIR           3 files         valid     .\sources

 √  LOGS_DIR              1 files         valid     .\logs

 √  ARCHIVE_DIR           2 files         valid     .\archive

 √  CONFIG_FILE           84.0 Bytes      valid     .\ArchiveBox.conf

 √  SQL_INDEX             212.0 KB        valid     .\index.sqlite3


[!] Warning: Missing 4 recommended dependencies
    ! WGET_BINARY: wget (unable to detect version)
    ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! RIPGREP_BINARY: rg (unable to detect version)
<!-- gh-comment-id:890308145 --> @Explorare commented on GitHub (Jul 31, 2021): Tried to install the dependencies manually but the check result stills blaming not found them. The SingleFile dependency once validated but became invalid in the next check. Don't know why. ``` $ npm install -g "gildas-lormeau/SingleFile#master" added 146 packages, and audited 147 packages in 1m 9 packages are looking for funding run `npm fund` for details found 0 vulnerabilities $ archivebox --version ArchiveBox v0.6.2 Cpython Windows Windows-10-10.0.19043-SP0 AMD64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe √ PYTHON_BINARY v3.9.6 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe √ DJANGO_BINARY v3.1.13 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py √ CURL_BINARY v7.55.1 valid C:\Windows\System32\curl.EXE X WGET_BINARY ? invalid wget √ NODE_BINARY v16.6.0 valid "C:\Program Files\nodejs\node.EXE" √ SINGLEFILE_BINARY v0.3.26 valid C:\Users\explo\AppData\Roaming\npm\single-file.cmd X READABILITY_BINARY ? invalid readability-extractor X MERCURY_BINARY ? invalid mercury-parser √ GIT_BINARY v2.31.1. valid "C:\Program Files\Git\cmd\git.EXE" √ YOUTUBEDL_BINARY v2021.06.06 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE - CHROME_BINARY - disabled X RIPGREP_BINARY ? invalid rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox √ TEMPLATES_DIR 3 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 7 files valid C:\Users\explo\Documents\Tools\ArchiveBox √ SOURCES_DIR 3 files valid .\sources √ LOGS_DIR 1 files valid .\logs √ ARCHIVE_DIR 2 files valid .\archive √ CONFIG_FILE 84.0 Bytes valid .\ArchiveBox.conf √ SQL_INDEX 212.0 KB valid .\index.sqlite3 [!] Warning: Missing 4 recommended dependencies ! WGET_BINARY: wget (unable to detect version) ! READABILITY_BINARY: readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! MERCURY_BINARY: mercury-parser (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_MERCURY=False ! RIPGREP_BINARY: rg (unable to detect version) $ npm install -g 'git+https://github.com/pirate/readability-extractor' added 64 packages, and audited 65 packages in 11s found 0 vulnerabilities explo@THINKPAD  ~\..\..\ArchiveBox  npm install @postlight/mercury-parser npm WARN deprecated request-promise-native@1.0.9: request-promise-native has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142 npm WARN deprecated har-validator@5.1.5: this library is no longer supported npm WARN deprecated request-promise@4.2.6: request-promise has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142 npm WARN deprecated left-pad@1.3.0: use String.prototype.padStart() npm WARN deprecated querystring@0.2.0: The querystring API is considered Legacy. new code should use the URLSearchParams API instead. npm WARN deprecated uuid@3.4.0: Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details. npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142 npm WARN deprecated core-js@2.6.12: core-js@<3.3 is no longer maintained and not recommended for usage due to the number of issues. Because of the V8 engine whims, feature detection in old core-js versions could cause a slowdown up to 100x even if nothing is polyfilled. Please, upgrade your dependencies to the actual version of core-js. added 272 packages, and audited 279 packages in 8s 11 packages are looking for funding run `npm fund` for details 2 moderate severity vulnerabilities Some issues need review, and may require choosing a different dependency. Run `npm audit` for details. $ archivebox --version ArchiveBox v0.6.2 Cpython Windows Windows-10-10.0.19043-SP0 AMD64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe √ PYTHON_BINARY v3.9.6 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe √ DJANGO_BINARY v3.1.13 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py √ CURL_BINARY v7.55.1 valid C:\Windows\System32\curl.EXE X WGET_BINARY ? invalid wget √ NODE_BINARY v16.6.0 valid "C:\Program Files\nodejs\node.EXE" X SINGLEFILE_BINARY ? invalid .\node_modules\.bin\single-file X READABILITY_BINARY ? invalid .\node_modules\.bin\readability-extractor √ MERCURY_BINARY v1.0.0 valid .\node_modules\.bin\mercury-parser √ GIT_BINARY v2.31.1. valid "C:\Program Files\Git\cmd\git.EXE" √ YOUTUBEDL_BINARY v2021.06.06 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE - CHROME_BINARY - disabled X RIPGREP_BINARY ? invalid rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox √ TEMPLATES_DIR 3 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 8 files valid C:\Users\explo\Documents\Tools\ArchiveBox √ SOURCES_DIR 3 files valid .\sources √ LOGS_DIR 1 files valid .\logs √ ARCHIVE_DIR 2 files valid .\archive √ CONFIG_FILE 84.0 Bytes valid .\ArchiveBox.conf √ SQL_INDEX 212.0 KB valid .\index.sqlite3 [!] Warning: Missing 4 recommended dependencies ! WGET_BINARY: wget (unable to detect version) ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! RIPGREP_BINARY: rg (unable to detect version) ```
Author
Owner

@pirate commented on GitHub (Aug 2, 2021):

Can you post the output of running these commands inside the archivebox directory:

readability-extractor --version
mercury-parser --version
single-file --version

you may need to prepend ./node_modules/.bin like so:

./node_modules/.bin/readability-extractor --version
./node_modules/.bin/mercury-parser --version
./node_modules/.bin/single-file --version
<!-- gh-comment-id:890620955 --> @pirate commented on GitHub (Aug 2, 2021): Can you post the output of running these commands inside the archivebox directory: ```bash readability-extractor --version mercury-parser --version single-file --version ``` you may need to prepend `./node_modules/.bin` like so: ```bash ./node_modules/.bin/readability-extractor --version ./node_modules/.bin/mercury-parser --version ./node_modules/.bin/single-file --version ```
Author
Owner

@Explorare commented on GitHub (Aug 2, 2021):

I installed the mercury-parser again globally this time and it is recognized. But with no luck on the rest.

 explo@THINKPAD  ~\..\..\ArchiveBox  readability-extractor --version
0.0.3
 explo@THINKPAD  ~\..\..\ArchiveBox  mercury-parser --version

mercury-parser

    The Mercury Parser extracts semantic content from any url

Usage:

    $ mercury-parser url-to-parse [--format=html|text|markdown] [--header.name=value]... [--extend type=selector]... [--extend-list type=selector]... [--add-extractor path_to_extractor.js]...


 explo@THINKPAD  ~\..\..\ArchiveBox  single-file --version
0.3.26
 explo@THINKPAD  ~\..\..\ArchiveBox  archivebox --version
ArchiveBox v0.6.2
Cpython Windows Windows-10-10.0.19043-SP0 AMD64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe
 √  PYTHON_BINARY         v3.9.6          valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe

 √  DJANGO_BINARY         v3.1.13         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py
 √  CURL_BINARY           v7.55.1         valid     C:\Windows\System32\curl.EXE

 X  WGET_BINARY           ?               invalid   wget

 √  NODE_BINARY           v16.6.0         valid     "C:\Program Files\nodejs\node.EXE"

 X  SINGLEFILE_BINARY     ?               invalid   .\node_modules\.bin\single-file

 X  READABILITY_BINARY    ?               invalid   .\node_modules\.bin\readability-extractor

 √  MERCURY_BINARY        v1.0.0          valid     .\node_modules\.bin\mercury-parser

 √  GIT_BINARY            v2.31.1.        valid     "C:\Program Files\Git\cmd\git.EXE"

 √  YOUTUBEDL_BINARY      v2021.06.06     valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE
 -  CHROME_BINARY         -               disabled

 X  RIPGREP_BINARY        ?               invalid   rg


[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox
 √  TEMPLATES_DIR         3 files         valid     C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled


[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled

 -  COOKIES_FILE          -               disabled


[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     C:\Users\explo\Documents\Tools\ArchiveBox

 √  SOURCES_DIR           3 files         valid     .\sources

 √  LOGS_DIR              1 files         valid     .\logs

 √  ARCHIVE_DIR           2 files         valid     .\archive

 √  CONFIG_FILE           84.0 Bytes      valid     .\ArchiveBox.conf

 √  SQL_INDEX             212.0 KB        valid     .\index.sqlite3


[!] Warning: Missing 4 recommended dependencies
    ! WGET_BINARY: wget (unable to detect version)
    ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False

    ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version)
      Hint: To install all packages automatically run: archivebox setup
            or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False

    ! RIPGREP_BINARY: rg (unable to detect version)
<!-- gh-comment-id:890989679 --> @Explorare commented on GitHub (Aug 2, 2021): I installed the mercury-parser again globally this time and it is recognized. But with no luck on the rest. ``` explo@THINKPAD  ~\..\..\ArchiveBox  readability-extractor --version 0.0.3 explo@THINKPAD  ~\..\..\ArchiveBox  mercury-parser --version mercury-parser The Mercury Parser extracts semantic content from any url Usage: $ mercury-parser url-to-parse [--format=html|text|markdown] [--header.name=value]... [--extend type=selector]... [--extend-list type=selector]... [--add-extractor path_to_extractor.js]... explo@THINKPAD  ~\..\..\ArchiveBox  single-file --version 0.3.26 explo@THINKPAD  ~\..\..\ArchiveBox  archivebox --version ArchiveBox v0.6.2 Cpython Windows Windows-10-10.0.19043-SP0 AMD64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\archivebox.exe √ PYTHON_BINARY v3.9.6 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\python.exe √ DJANGO_BINARY v3.1.13 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\django\bin\django-admin.py √ CURL_BINARY v7.55.1 valid C:\Windows\System32\curl.EXE X WGET_BINARY ? invalid wget √ NODE_BINARY v16.6.0 valid "C:\Program Files\nodejs\node.EXE" X SINGLEFILE_BINARY ? invalid .\node_modules\.bin\single-file X READABILITY_BINARY ? invalid .\node_modules\.bin\readability-extractor √ MERCURY_BINARY v1.0.0 valid .\node_modules\.bin\mercury-parser √ GIT_BINARY v2.31.1. valid "C:\Program Files\Git\cmd\git.EXE" √ YOUTUBEDL_BINARY v2021.06.06 valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Scripts\youtube-dl.EXE - CHROME_BINARY - disabled X RIPGREP_BINARY ? invalid rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox √ TEMPLATES_DIR 3 files valid C:\Users\explo\AppData\Local\Programs\Python\Python39\Lib\site-packages\archivebox\templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 8 files valid C:\Users\explo\Documents\Tools\ArchiveBox √ SOURCES_DIR 3 files valid .\sources √ LOGS_DIR 1 files valid .\logs √ ARCHIVE_DIR 2 files valid .\archive √ CONFIG_FILE 84.0 Bytes valid .\ArchiveBox.conf √ SQL_INDEX 212.0 KB valid .\index.sqlite3 [!] Warning: Missing 4 recommended dependencies ! WGET_BINARY: wget (unable to detect version) ! SINGLEFILE_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\single-file (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_SINGLEFILE=False ! READABILITY_BINARY: C:\Users\explo\Documents\Tools\ArchiveBox\node_modules\.bin\readability-extractor (unable to detect version) Hint: To install all packages automatically run: archivebox setup or to disable it and silence this warning: archivebox config --set SAVE_READABILITY=False ! RIPGREP_BINARY: rg (unable to detect version) ```
Author
Owner

@pirate commented on GitHub (Aug 4, 2021):

 -  CHROME_BINARY         -               disabled

I just noticed you don't have CHROME enabled, which means Singlefile work work anyway (it needs Chrome). Readability and Mercury may be able to work using the wget output, but you also don't have wget installed so all three are pointless. Without either wget or Chrome, ArchiveBox is really not doing much, most of the extractors will fail.

I highly highly recommend running ArchiveBox in docker instead, running it directly on Windows without Docker/WSL/WSL2 is really not supported as you'll run into all kinds of dependency problems such as what you encountered here.

I'm going to close this as wontfix because I don't want to set a precedent of supporting ArchiveBox directly on Windows, or I will open pandoras box and have far too many support tickets given its lackluster compatibility right now. Docker on Windows is the only Windows install method I'm willing to provide support for.

Sorry for the hassle, but I promise in the long run you will have fewer issues and a more secure setup running it in Docker vs without.

<!-- gh-comment-id:892263551 --> @pirate commented on GitHub (Aug 4, 2021): - CHROME_BINARY - disabled I just noticed you don't have CHROME enabled, which means Singlefile work work anyway (it needs Chrome). Readability and Mercury may be able to work using the wget output, but you also don't have wget installed so all three are pointless. Without either wget or Chrome, ArchiveBox is really not doing much, most of the extractors will fail. I highly highly recommend running ArchiveBox in docker instead, running it directly on Windows without Docker/WSL/WSL2 is really not supported as you'll run into all kinds of dependency problems such as what you encountered here. I'm going to close this as `wontfix` because I don't want to set a precedent of supporting ArchiveBox directly on Windows, or I will open pandoras box and have far too many support tickets given its lackluster compatibility right now. [Docker on Windows](https://github.com/ArchiveBox/ArchiveBox#%EF%B8%8F-easy-setup) is the only Windows install method I'm willing to provide support for. Sorry for the hassle, but I promise in the long run you will have fewer issues and a more secure setup running it in Docker vs without.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3526
No description provided.