[GH-ISSUE #920] Question: Snapshot exists in DB, but resource /singlefile.html does not exist in snapshot dir yet. #2082

Closed
opened 2026-03-01 17:56:20 +03:00 by kerem · 4 comments
Owner

Originally created by @2600box on GitHub (Jan 31, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/920

I have just set up archive box on ubuntu 20.04 with the script. No errors in packages and all looks good.

When I archive a site, I can see different results, but the singlefile, chrome and git have this messages:


Snapshot [1643631596.804901] exists in DB, but resource 1643631596.804901/singlefile.html does not exist in snapshot dir yet.

Maybe this resource type is not availabe for this Snapshot,
or the archiving process has not completed yet?
# run this cmd to finish archiving this Snapshot
archivebox update -t timestamp 1643631596.804901

I can see in the log errors like: Not Found: /archive/1643631596.804901/singlefile.html

If I run the suggested command, it does not fix the issue and this is the result:

$ archivebox update -t timestamp 1643631596.804901
[i] [2022-01-31 12:28:07] ArchiveBox v0.6.2: archivebox update -t timestamp 1643631596.804901
    > /home/archivebox/archivebox


[_] [2022-01-31 12:28:08] Starting archiving of 1 snapshots in index...

[_] [2022-01-31 12:28:08] "Deep Fakes: Art and Its Double - EPFL Pavilions"
    https://epfl-pavilions.ch/exhibitions/deep-fakes-art-and-its-double
    _ ./archive/1643631596.804901
        61 files (7.8 MB) in 0:00:00s

[_] [2022-01-31 12:28:09] Update of 1 pages complete (0.22 sec)
    - 1 links skipped
    - 0 links updated
    - 0 links had errors

    Hint: To manage your archive in a Web UI, run:
        archivebox server 0.0.0.0:8000

Is this a configuration issue? What am I doing wrong?

Thanks!

Originally created by @2600box on GitHub (Jan 31, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/920 I have just set up archive box on ubuntu 20.04 with the script. No errors in packages and all looks good. When I archive a site, I can see different results, but the singlefile, chrome and git have this messages: ``` Snapshot [1643631596.804901] exists in DB, but resource 1643631596.804901/singlefile.html does not exist in snapshot dir yet. Maybe this resource type is not availabe for this Snapshot, or the archiving process has not completed yet? # run this cmd to finish archiving this Snapshot archivebox update -t timestamp 1643631596.804901 ``` I can see in the log errors like: `Not Found: /archive/1643631596.804901/singlefile.html` If I run the suggested command, it does not fix the issue and this is the result: ``` $ archivebox update -t timestamp 1643631596.804901 [i] [2022-01-31 12:28:07] ArchiveBox v0.6.2: archivebox update -t timestamp 1643631596.804901 > /home/archivebox/archivebox [_] [2022-01-31 12:28:08] Starting archiving of 1 snapshots in index... [_] [2022-01-31 12:28:08] "Deep Fakes: Art and Its Double - EPFL Pavilions" https://epfl-pavilions.ch/exhibitions/deep-fakes-art-and-its-double _ ./archive/1643631596.804901 61 files (7.8 MB) in 0:00:00s [_] [2022-01-31 12:28:09] Update of 1 pages complete (0.22 sec) - 1 links skipped - 0 links updated - 0 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 ``` Is this a configuration issue? What am I doing wrong? Thanks!
kerem closed this issue 2026-03-01 17:56:20 +03:00
Author
Owner

@pirate commented on GitHub (Mar 16, 2022):

Please post the full output of archivebox --version.

<!-- gh-comment-id:1069600793 --> @pirate commented on GitHub (Mar 16, 2022): Please post the full output of `archivebox --version`.
Author
Owner

@rickcecil commented on GitHub (Aug 29, 2022):

So I am experiencing this error as well. The full output of "archivebox --version" is below.

Any help would be greatly appreciated.

Some additional details: I am importing new links via text file imported through CLI. There are Between 1000-2500 links per text file. The logs indicate that everything that all download attempts have been successful.

From the UI, the system indicates that it is successful, but "size" is blank.

If I pull these items again, it works successfully.

It is difficult to determine how many items have failed. but it does appear to be around a thousand or more.

Let me know if you need any additional information.

Right now, my solution is to go back through page-by-page and re-pull the pages. It is somewhat tedious, but not too bad.

It would be great if you had a) additional troubleshooting tips; b) and instructions (if possible) on how to repull the items that are missing data (I investigated this option, but did not find anything. I will look more later. Sometimes I miss things that are right in front of me.)

0.6.3
ArchiveBox v0.6.3 Cpython Linux Linux-5.15.0-46-generic-x86_64-with-glibc2.29 x86_64
DEBUG=False IN_DOCKER=False IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=False FS_PERMS=644 1000:1000 SEARCH_BACKEND=ripgrep

[i] Dependency versions:
 √  PYTHON_BINARY         v3.8.10         valid     /usr/bin/python3.8                                                          
 √  SQLITE_BINARY         v2.6.0          valid     /usr/lib/python3.8/sqlite3/dbapi2.py                                        
 √  DJANGO_BINARY         v3.1.14         valid     /home/rick/.local/lib/python3.8/site-packages/django/__init__.py            
 √  ARCHIVEBOX_BINARY     v0.6.3          valid     /home/rick/.local/bin/archivebox                                            

 √  CURL_BINARY           v7.68.0         valid     /usr/bin/curl                                                               
 -  WGET_BINARY           -               disabled  /usr/bin/wget                                                               
 √  NODE_BINARY           v17.6.0         valid     /home/rick/.nvm/versions/node/v17.6.0/bin/node                              
 √  SINGLEFILE_BINARY     v1.0.16         valid     /home/rick/apps/single-file-cli/single-file                                 
 -  READABILITY_BINARY    -               disabled  ./node_modules/readability-extractor/readability-extractor                  
 -  MERCURY_BINARY        -               disabled  ./node_modules/@postlight/mercury-parser/cli.js                             
 -  GIT_BINARY            -               disabled  /usr/bin/git                                                                
 -  YOUTUBEDL_BINARY      -               disabled  /home/rick/.local/bin/youtube-dl                                            
 √  CHROME_BINARY         v104.0.5112.101  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v11.0.2         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /home/rick/.local/lib/python3.8/site-packages/archivebox                    
 √  TEMPLATES_DIR         3 files         valid     /home/rick/.local/lib/python3.8/site-packages/archivebox/templates          
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  48 files        valid     /home/rick/snap/chromium/common/chromium                                    
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            17 files        valid     /home/rick/appdata/archivebox/data                                          
 √  SOURCES_DIR           147 files       valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           8018 files      valid     ./archive                                                                   
 √  CONFIG_FILE           1.1 KB          valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             67.1 MB         valid     ./index.sqlite3                                                             


<!-- gh-comment-id:1230645864 --> @rickcecil commented on GitHub (Aug 29, 2022): So I am experiencing this error as well. The full output of "archivebox --version" is below. Any help would be greatly appreciated. Some additional details: I am importing new links via text file imported through CLI. There are Between 1000-2500 links per text file. The logs indicate that everything that all download attempts have been successful. From the UI, the system indicates that it is successful, but "size" is blank. If I pull these items again, it works successfully. It is difficult to determine how many items have failed. but it does appear to be around a thousand or more. Let me know if you need any additional information. Right now, my solution is to go back through page-by-page and re-pull the pages. It is somewhat tedious, but not too bad. It would be great if you had a) additional troubleshooting tips; b) and instructions (if possible) on how to repull the items that are missing data (I investigated this option, but did not find anything. I will look more later. Sometimes I miss things that are right in front of me.) ``` 0.6.3 ArchiveBox v0.6.3 Cpython Linux Linux-5.15.0-46-generic-x86_64-with-glibc2.29 x86_64 DEBUG=False IN_DOCKER=False IS_TTY=True TZ=UTC FS_ATOMIC=True FS_REMOTE=False FS_PERMS=644 1000:1000 SEARCH_BACKEND=ripgrep [i] Dependency versions: √ PYTHON_BINARY v3.8.10 valid /usr/bin/python3.8 √ SQLITE_BINARY v2.6.0 valid /usr/lib/python3.8/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /home/rick/.local/lib/python3.8/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.6.3 valid /home/rick/.local/bin/archivebox √ CURL_BINARY v7.68.0 valid /usr/bin/curl - WGET_BINARY - disabled /usr/bin/wget √ NODE_BINARY v17.6.0 valid /home/rick/.nvm/versions/node/v17.6.0/bin/node √ SINGLEFILE_BINARY v1.0.16 valid /home/rick/apps/single-file-cli/single-file - READABILITY_BINARY - disabled ./node_modules/readability-extractor/readability-extractor - MERCURY_BINARY - disabled ./node_modules/@postlight/mercury-parser/cli.js - GIT_BINARY - disabled /usr/bin/git - YOUTUBEDL_BINARY - disabled /home/rick/.local/bin/youtube-dl √ CHROME_BINARY v104.0.5112.101 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v11.0.2 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 23 files valid /home/rick/.local/lib/python3.8/site-packages/archivebox √ TEMPLATES_DIR 3 files valid /home/rick/.local/lib/python3.8/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: √ CHROME_USER_DATA_DIR 48 files valid /home/rick/snap/chromium/common/chromium - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 17 files valid /home/rick/appdata/archivebox/data √ SOURCES_DIR 147 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 8018 files valid ./archive √ CONFIG_FILE 1.1 KB valid ./ArchiveBox.conf √ SQL_INDEX 67.1 MB valid ./index.sqlite3 ```
Author
Owner

@vcudachi commented on GitHub (Aug 27, 2024):

Almost the same issue for me. I have added a link to archive the web page. Got the result: 4 9 and title "Pending...". Then nothing happens, and the logs are empty. The message is:

Snapshot [1724776314.904378] exists in DB, but resource 1724776314.904378/singlefile.html does not exist in snapshot dir yet.

Maybe this resource type is not availabe for this Snapshot,
or the archiving process has not completed yet?

run this cmd to finish archiving this Snapshot

archivebox update -t timestamp 1724776314.904378

Next, I ran the command archivebox update -t timestamp 1724776314.904378 in the terminal and archiving completed successfully.
Is this normal behavior?

The output of version command:

user@archivebox:~/archivebox$ archivebox version
/usr/bin/chromium-browser: 12: xdg-settings: not found
0.7.1
ArchiveBox v0.7.1 BUILD_TIME=2024-08-15 11:52:06 1723722726
IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.15.0-118-generic-x86_64-with-glibc2.35 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.10.12        valid     /usr/bin/python3.10
 √  SQLITE_BINARY         v2.6.0          valid     /usr/lib/python3.10/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /home/user/.local/lib/python3.10/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.1          valid     /home/user/.local/bin/archivebox

 √  CURL_BINARY           v7.81.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.21.2         valid     /usr/bin/wget
 √  NODE_BINARY           v20.17.0        valid     /snap/bin/node
 √  SINGLEFILE_BINARY     v1.1.35         valid     ./node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.9          valid     ./node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/parser/cli.js
 √  GIT_BINARY            v2.34.1         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2024.08.06     valid     /home/user/.local/bin/yt-dlp
 √  CHROME_BINARY         v128.0.6613.84  valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /home/user/.local/lib/python3.10/site-packages/archivebox
 √  TEMPLATES_DIR         4 files         valid     /home/user/.local/lib/python3.10/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            7 files         valid     /home/user/archivebox
 √  SOURCES_DIR           34 files        valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           26 files        valid     ./archive
 √  CONFIG_FILE           123.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             772.0 KB        valid     ./index.sqlite3

<!-- gh-comment-id:2313099613 --> @vcudachi commented on GitHub (Aug 27, 2024): Almost the same issue for me. I have added a link to archive the web page. Got the result: ✅ 4 ❌ 9 and title "Pending...". Then nothing happens, and the logs are empty. The message is: > Snapshot [[1724776314.904378]](http://site:8000/archive/1724776314.904378/index.html) exists in DB, but resource 1724776314.904378/singlefile.html does not exist in [snapshot dir](http://site:8000/archive/1724776314.904378/) yet. > > Maybe this resource type is not availabe for this Snapshot, > or the archiving process has not completed yet? > > # run this cmd to finish archiving this Snapshot > archivebox update -t timestamp 1724776314.904378 Next, I ran the command **archivebox update -t timestamp 1724776314.904378** in the terminal and archiving completed successfully. Is this normal behavior? The output of version command: ``` user@archivebox:~/archivebox$ archivebox version /usr/bin/chromium-browser: 12: xdg-settings: not found 0.7.1 ArchiveBox v0.7.1 BUILD_TIME=2024-08-15 11:52:06 1723722726 IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.15.0-118-generic-x86_64-with-glibc2.35 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=644 DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.10.12 valid /usr/bin/python3.10 √ SQLITE_BINARY v2.6.0 valid /usr/lib/python3.10/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /home/user/.local/lib/python3.10/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.1 valid /home/user/.local/bin/archivebox √ CURL_BINARY v7.81.0 valid /usr/bin/curl √ WGET_BINARY v1.21.2 valid /usr/bin/wget √ NODE_BINARY v20.17.0 valid /snap/bin/node √ SINGLEFILE_BINARY v1.1.35 valid ./node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.9 valid ./node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.34.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2024.08.06 valid /home/user/.local/bin/yt-dlp √ CHROME_BINARY v128.0.6613.84 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /home/user/.local/lib/python3.10/site-packages/archivebox √ TEMPLATES_DIR 4 files valid /home/user/.local/lib/python3.10/site-packages/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 7 files valid /home/user/archivebox √ SOURCES_DIR 34 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 26 files valid ./archive √ CONFIG_FILE 123.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 772.0 KB valid ./index.sqlite3 ```
Author
Owner

@pirate commented on GitHub (Sep 6, 2024):

This should be fixed in the latest v0.8.3-rc pre-release, adding via the UI now starts the job in a background thread so it shouldnt appear "stalled" like you saw.

https://github.com/ArchiveBox/ArchiveBox/releases/tag/v0.8.3-rc

Give that a try and let me know if you still encounter any issues.

<!-- gh-comment-id:2333775096 --> @pirate commented on GitHub (Sep 6, 2024): This should be fixed in the latest v0.8.3-rc pre-release, adding via the UI now starts the job in a background thread so it shouldnt appear "stalled" like you saw. https://github.com/ArchiveBox/ArchiveBox/releases/tag/v0.8.3-rc Give that a try and let me know if you still encounter any issues.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2082
No description provided.