[GH-ISSUE #1009] Question: binary path in config #3652

Closed
opened 2026-03-14 23:54:32 +03:00 by kerem · 7 comments
Owner

Originally created by @Dontkickmi22 on GitHub (Aug 4, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1009

Hi, first post (this is how much I want it back)

Short version:
Can anyone please give me a pointer about where exactly is ./node_modules/single-file/cli/single-file ?
I need to specify them in my conf file.
Thank you

Long version:
Pi 4, docker installaion, ran fine for the past weeks, just broke on me out of the blue.
Snapshot quickly stopped at 6MB (that particular page from my experience should be way less than 6MB).
Error log says unable to install chromium thru playwright.
I decided to skip docker and install on host (not gonna T-shoot that).
Did init, updated node.js, setup. Output of archivebox version below looks fine.
But when trying to snapshot again, I got chromium Errno 2, singlefile Errno 2, and readability Errno 2 , and mercury Errno 2.

           ...
            Extractor failed:                                                                                                   
            FileNotFoundError [Errno 2] No such file or directory: 'single-file': 'single-file'
            Extractor failed:                                                                                                   
            FileNotFoundError [Errno 2] No such file or directory: 'readability-extractor': 'readability-extractor'
           ...

So I looked up the chromium binary location from archivebox version output and added below in ArchiveBox.conf
CHROME_BINARY = /usr/bin/chromium-browser
That fixed the chromium Errno 2. That was easy.

So I go ahead and did the same to singlefile, readability, and mercury in ArchiveBox.conf

SINGLEFILE_BINARY = ./node_modules/single-file/cli/single-file
READABILITY_BINARY =  ./node_modules/readability-extractor/readability-extractor
MERCURY_BINARY = ./node_modules/@postlight/mercury-parser/cli.js

Error stays. Pointer please~!

Thank you

$ archivebox version
ArchiveBox v0.6.2
Cpython Linux Linux-5.10.103-v8+-aarch64-with-debian-10.12 aarch64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY                 v0.6.2          valid     /usr/bin/archivebox                                                         
 √  PYTHON_BINARY                         v3.7.3          valid     /usr/bin/python3.7                                                          
 √  DJANGO_BINARY                       v3.1.14          valid     /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py           
 √  CURL_BINARY                           v7.64.0          valid     /usr/bin/curl                                                               
 √  WGET_BINARY                           v1.20.1          valid     /usr/bin/wget                                                               
 √  NODE_BINARY                         v16.16.0         valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY                  v1.0.11          valid     ./node_modules/single-file/cli/single-file                                  
 √  READABILITY_BINARY                v0.0.4          valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY                      v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js                             
 √  GIT_BINARY                               v2.20.1          valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY        v2021.12.17          valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v92.0.4515.98          valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY                      v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/lib/python3/dist-packages/archivebox                                   
 √  TEMPLATES_DIR         3 files         valid     /usr/lib/python3/dist-packages/archivebox/templates                         
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            7 files         valid     /media/archive                                                              
 √  SOURCES_DIR           0 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           0 files         valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             204.0 KB        valid     ./index.sqlite3                                                             
Originally created by @Dontkickmi22 on GitHub (Aug 4, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1009 Hi, first post (this is how much I want it back) Short version: Can anyone please give me a pointer about where exactly is `./node_modules/single-file/cli/single-file` ? I need to specify them in my conf file. Thank you Long version: Pi 4, docker installaion, ran fine for the past weeks, just broke on me out of the blue. Snapshot quickly stopped at 6MB (that particular page from my experience should be way less than 6MB). Error log says unable to install chromium thru playwright. I decided to skip docker and install on host (not gonna T-shoot that). Did init, updated node.js, setup. Output of archivebox version below looks fine. But when trying to snapshot again, I got chromium `Errno 2, singlefile Errno 2, and readability Errno 2 , and mercury Errno 2.` ```logs ... Extractor failed: FileNotFoundError [Errno 2] No such file or directory: 'single-file': 'single-file' Extractor failed: FileNotFoundError [Errno 2] No such file or directory: 'readability-extractor': 'readability-extractor' ... ``` So I looked up the chromium binary location from archivebox version output and added below in ArchiveBox.conf `CHROME_BINARY = /usr/bin/chromium-browser` That fixed the chromium Errno 2. That was easy. So I go ahead and did the same to singlefile, readability, and mercury in ArchiveBox.conf ```ini SINGLEFILE_BINARY = ./node_modules/single-file/cli/single-file READABILITY_BINARY = ./node_modules/readability-extractor/readability-extractor MERCURY_BINARY = ./node_modules/@postlight/mercury-parser/cli.js ``` Error stays. Pointer please~! Thank you ```bash $ archivebox version ArchiveBox v0.6.2 Cpython Linux Linux-5.10.103-v8+-aarch64-with-debian-10.12 aarch64 IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/bin/archivebox √ PYTHON_BINARY v3.7.3 valid /usr/bin/python3.7 √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.7/dist-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v16.16.0 valid /usr/bin/node √ SINGLEFILE_BINARY v1.0.11 valid ./node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.4 valid ./node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.12.17 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v92.0.4515.98 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /usr/lib/python3/dist-packages/archivebox √ TEMPLATES_DIR 3 files valid /usr/lib/python3/dist-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 7 files valid /media/archive √ SOURCES_DIR 0 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 0 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 204.0 KB valid ./index.sqlite3 ```
kerem closed this issue 2026-03-14 23:54:37 +03:00
Author
Owner

@pirate commented on GitHub (Aug 4, 2022):

Set it to the absolute path of ./node_modules/@postlight/mercury-parser/cli.js not the relative one.

e.g. /Users/yourusername/archivebox/node_modules/@postlight/mercury-parser/cli.js

<!-- gh-comment-id:1205727401 --> @pirate commented on GitHub (Aug 4, 2022): Set it to the absolute path of `./node_modules/@postlight/mercury-parser/cli.js` not the relative one. e.g. `/Users/yourusername/archivebox/node_modules/@postlight/mercury-parser/cli.js`
Author
Owner

@Dontkickmi22 commented on GitHub (Aug 5, 2022):

Hi pirate,

Appreciate your input.

Short version, how can I reinstall those three programs? I mean uninstall, and reinstall?
Thank you

Long version,
a quick find returns those directories are only installed deep beneath under /var/lib/docker/overlay/...

sudo find / -type d -name 'single-file' -print
/var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file
/var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file/lib/single-file
/var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file/extension/lib/single-file
/var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file
/var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file/lib/single-file
/var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file/extension/lib/single-file

However, I did the installation again under user pi with apt install archivebox.
So I decided the quickest way to finish this for me would be to reinstall these 3 functions, I ran :

npm install --no-audit --no-fund 'git+https://github.com/gildas-lormeau/SingleFile.git'

Returns:

...
npm ERR! Tracker "idealTree" already exists
npm ERR! A complete log of this run can be found in:
npm ERR!     /home/pi/.npm/_logs/2022-08-05T00_20_56_807Z-debug-0.log
...

Above mentioned log:

...
verbose stack Error: Tracker "idealTree" already exists
43 verbose stack     at Arborist.[_onError] (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/tracker.js:100:11)
43 verbose stack     at Arborist.addTracker (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/tracker.js:27:21)
43 verbose stack     at Arborist.[buildDeps] (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/build-ideal-tree.js:823:10)
43 verbose stack     at Arborist.buildIdealTree (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/build-ideal-tree.js:218:29)
43 verbose stack     at async Promise.all (index 1)
43 verbose stack     at async Arborist.reify (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/reify.js:153:5)
43 verbose stack     at async Install.exec (/usr/lib/node_modules/npm/lib/commands/install.js:156:5)
43 verbose stack     at async module.exports (/usr/lib/node_modules/npm/lib/cli.js:78:5)
...

Thank you

<!-- gh-comment-id:1205913118 --> @Dontkickmi22 commented on GitHub (Aug 5, 2022): Hi pirate, Appreciate your input. Short version, how can I reinstall those three programs? I mean uninstall, and reinstall? Thank you Long version, a quick find returns those directories are only installed deep beneath under /var/lib/docker/overlay/... ```bash sudo find / -type d -name 'single-file' -print /var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file /var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file/lib/single-file /var/lib/docker/overlay2/2890280801cbc09c113a55412143e8cfa33ca15482844ed75edbf8a3eceaad49/diff/node/node_modules/single-file/extension/lib/single-file /var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file /var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file/lib/single-file /var/lib/docker/overlay2/1337d919d623720d99fd9abbbc5a4ddcd5514395a3a1e305701eab29d4954b77/diff/app/node_modules/single-file/extension/lib/single-file ``` However, I did the installation again under user pi with apt install archivebox. So I decided the quickest way to finish this for me would be to reinstall these 3 functions, I ran : ```bash npm install --no-audit --no-fund 'git+https://github.com/gildas-lormeau/SingleFile.git' ``` Returns: ```bash ... npm ERR! Tracker "idealTree" already exists npm ERR! A complete log of this run can be found in: npm ERR! /home/pi/.npm/_logs/2022-08-05T00_20_56_807Z-debug-0.log ... ``` Above mentioned log: ```bash ... verbose stack Error: Tracker "idealTree" already exists 43 verbose stack at Arborist.[_onError] (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/tracker.js:100:11) 43 verbose stack at Arborist.addTracker (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/tracker.js:27:21) 43 verbose stack at Arborist.[buildDeps] (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/build-ideal-tree.js:823:10) 43 verbose stack at Arborist.buildIdealTree (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/build-ideal-tree.js:218:29) 43 verbose stack at async Promise.all (index 1) 43 verbose stack at async Arborist.reify (/usr/lib/node_modules/npm/node_modules/@npmcli/arborist/lib/arborist/reify.js:153:5) 43 verbose stack at async Install.exec (/usr/lib/node_modules/npm/lib/commands/install.js:156:5) 43 verbose stack at async module.exports (/usr/lib/node_modules/npm/lib/cli.js:78:5) ... ``` Thank you
Author
Owner

@pirate commented on GitHub (Aug 5, 2022):

if you're running in docker you shouldnt need to mess with or install those 3 manually, but outside of docker to fix it you should just need to run archivebox init --setup and it will do the npm install for you.

<!-- gh-comment-id:1206726875 --> @pirate commented on GitHub (Aug 5, 2022): if you're running in docker you shouldnt need to mess with or install those 3 manually, but outside of docker to fix it you should just need to run `archivebox init --setup` and it will do the `npm install` for you.
Author
Owner

@Dontkickmi22 commented on GitHub (Aug 5, 2022):

Thank you.
Yea, I was running in docker, but now I am ditching it.
I did go thru the installation process for running host (on the same host) though.
Your command did took care of those 3 programs for me by reinstalling them automatically.
That was very nice.
But now when use firefox extension to archive, console returns

Not Found: //add/
"POST //add/ HTTP/1.1" 404 179

By the looks of it, I believe it is saying the host is missing add function?

So , on the host console, I typed

archivebox add ' xxxxxxxx.xxx/xxx'

Console reacts normal, it just went do it's thing and completed the process soon after.
Went and check WebUI, there shows a new entry with content.

It's just the extension not functioning, hmmm

<!-- gh-comment-id:1206953271 --> @Dontkickmi22 commented on GitHub (Aug 5, 2022): Thank you. Yea, I was running in docker, but now I am ditching it. I did go thru the installation process for running host (on the same host) though. Your command did took care of those 3 programs for me by reinstalling them automatically. That was very nice. But now when use firefox extension to archive, console returns ```bash Not Found: //add/ "POST //add/ HTTP/1.1" 404 179 ``` By the looks of it, I believe it is saying the host is missing add function? So , on the host console, I typed ```bash archivebox add ' xxxxxxxx.xxx/xxx' ``` Console reacts normal, it just went do it's thing and completed the process soon after. Went and check WebUI, there shows a new entry with content. It's just the extension not functioning, hmmm
Author
Owner

@Dontkickmi22 commented on GitHub (Aug 7, 2022):

By the way, I get server error 500 when trying to delete snapshot.
Output below seems fine, I'm not suppose to find that directory and delete it manually, right?

archivebox status
[*] Scanning archive main index...
    /media/archivebox/* 
    Index size: 2.6 MB across 3 files

    > SQL Main Index: 296 links      (found in index.sqlite3)
    > JSON Link Details: 296 links   (found in archive/*/index.json)

[*] Scanning archive data directories...
    /media/archivebox/archive/* 
    Size: 3.8 GB across 11540 files in 3111 directories

    > indexed: 296                   (indexed links without checking archive status or data directory validity)
      > archived: 296                (indexed links that are archived with a valid data directory)
      > unarchived: 0                (indexed links that are unarchived with no data directory or an empty data directory)

    > present: 296                   (dirs that actually exist in the archive/ folder)
      > valid: 296                   (dirs with a valid index matched to the main index and archived content)
      > invalid: 0                   (dirs that are invalid for any reason: corrupted/duplicate/orphaned/unrecognized)
        > duplicate: 0               (dirs that conflict with other directories that have the same link URL or timestamp)
        > orphaned: 0                (dirs that contain a valid index but aren't listed in the main index)
        > corrupted: 0               (dirs that don't contain a valid index and aren't listed in the main index)
        > unrecognized: 0            (dirs that don't contain recognizable archive data and aren't listed in the main index)

I wonder if I should open a ticket?

p.s. Sever Error 500 thru webUI add / delete. I will probably reinstall at a later date, for the moment I have zero issue perform thru console commands. Cheers~

<!-- gh-comment-id:1207451098 --> @Dontkickmi22 commented on GitHub (Aug 7, 2022): By the way, I get server error 500 when trying to delete snapshot. Output below seems fine, I'm not suppose to find that directory and delete it manually, right? ```bash archivebox status [*] Scanning archive main index... /media/archivebox/* Index size: 2.6 MB across 3 files > SQL Main Index: 296 links (found in index.sqlite3) > JSON Link Details: 296 links (found in archive/*/index.json) [*] Scanning archive data directories... /media/archivebox/archive/* Size: 3.8 GB across 11540 files in 3111 directories > indexed: 296 (indexed links without checking archive status or data directory validity) > archived: 296 (indexed links that are archived with a valid data directory) > unarchived: 0 (indexed links that are unarchived with no data directory or an empty data directory) > present: 296 (dirs that actually exist in the archive/ folder) > valid: 296 (dirs with a valid index matched to the main index and archived content) > invalid: 0 (dirs that are invalid for any reason: corrupted/duplicate/orphaned/unrecognized) > duplicate: 0 (dirs that conflict with other directories that have the same link URL or timestamp) > orphaned: 0 (dirs that contain a valid index but aren't listed in the main index) > corrupted: 0 (dirs that don't contain a valid index and aren't listed in the main index) > unrecognized: 0 (dirs that don't contain recognizable archive data and aren't listed in the main index) ``` I wonder if I should open a ticket? p.s. Sever Error 500 thru webUI add / delete. I will probably reinstall at a later date, for the moment I have zero issue perform thru console commands. Cheers~
Author
Owner

@pirate commented on GitHub (Aug 9, 2022):

Please screenshot your extension configuration. I suspect you did not enter the archivebox host correctly / at all as it's empty according to the outputed error above.

<!-- gh-comment-id:1208765364 --> @pirate commented on GitHub (Aug 9, 2022): Please screenshot your extension configuration. I suspect you did not enter the archivebox host correctly / at all as it's empty according to the outputed error above.
Author
Owner

@Dontkickmi22 commented on GitHub (Aug 9, 2022):

Here's the shot.

Screen Shot 2022-08-09 at 6 34 09 PM

I found out I can remove snapshot in console with command archivebox remove. It's the webui giving me server error 500.

<!-- gh-comment-id:1209199519 --> @Dontkickmi22 commented on GitHub (Aug 9, 2022): Here's the shot. ![Screen Shot 2022-08-09 at 6 34 09 PM](https://user-images.githubusercontent.com/110583321/183627764-c0e0b0ae-46cd-45e1-872d-ff1ce32d0515.png) I found out I can remove snapshot in console with command archivebox remove. It's the webui giving me server error 500.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3652
No description provided.