[GH-ISSUE #1466] Bug: URL gets truncated #870

Closed
opened 2026-03-01 14:46:57 +03:00 by kerem · 1 comment
Owner

Originally created by @sclu1034 on GitHub (Jul 10, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1466

Describe the bug

When attempting to add https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_(*.bnk), it gets silently truncated to https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_ before starting the snapshots.
Since the URL is now incorrect, the snapshot fails (or pulls useless content, in this particular case).

Running curl 'https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_(*.bnk)' manually works just fine.

Steps to reproduce

  1. Add the above URL via the UI
  2. Check the URL in the resulting snapshot list

ArchiveBox version

0.7.2
ArchiveBox v0.7.2 COMMIT_HASH=315c9f3 BUILD_TIME=2024-04-24 22:47:02 1713998822
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.1.79-Unraid-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=999:100 FS_PERMS=644
DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND=sonic LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.9         valid     /usr/local/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /usr/local/bin/archivebox

 √  CURL_BINARY           v8.5.0          valid     /usr/bin/curl
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget
 √  NODE_BINARY           v20.12.2        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v1.1.46         valid     /app/node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.11         valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v124.0.6367.29  valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            7 files @       valid     /data
 √  SOURCES_DIR           5057 files      valid     ./sources
 √  LOGS_DIR              2 files         valid     ./logs
 √  ARCHIVE_DIR           1602 files @    valid     ./archive
 √  CONFIG_FILE           195.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             31.9 MB         valid     ./index.sqlite3
Originally created by @sclu1034 on GitHub (Jul 10, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1466 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug When attempting to add `https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_(*.bnk)`, it gets silently truncated to `https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_` before starting the snapshots. Since the URL is now incorrect, the snapshot fails (or pulls useless content, in this particular case). Running `curl 'https://web.archive.org/web/20230817173759/http://wiki.xentax.com/index.php/Wwise_SoundBank_(*.bnk)'` manually works just fine. #### Steps to reproduce 1. Add the above URL via the UI 2. Check the URL in the resulting snapshot list #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs 0.7.2 ArchiveBox v0.7.2 COMMIT_HASH=315c9f3 BUILD_TIME=2024-04-24 22:47:02 1713998822 IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.1.79-Unraid-x86_64-with-glibc2.36 PYTHON=Cpython FS_ATOMIC=True FS_REMOTE=True FS_USER=999:100 FS_PERMS=644 DEBUG=False IS_TTY=False TZ=UTC SEARCH_BACKEND=sonic LDAP=False [i] Dependency versions: √ PYTHON_BINARY v3.11.9 valid /usr/local/bin/python3.11 √ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py √ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py √ ARCHIVEBOX_BINARY v0.7.2 valid /usr/local/bin/archivebox √ CURL_BINARY v8.5.0 valid /usr/bin/curl √ WGET_BINARY v1.21.3 valid /usr/bin/wget √ NODE_BINARY v20.12.2 valid /usr/bin/node √ SINGLEFILE_BINARY v1.1.46 valid /app/node_modules/single-file-cli/single-file √ READABILITY_BINARY v0.0.11 valid /app/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /app/node_modules/@postlight/parser/cli.js √ GIT_BINARY v2.39.2 valid /usr/bin/git √ YOUTUBEDL_BINARY v2023.12.30 valid /usr/local/bin/yt-dlp √ CHROME_BINARY v124.0.6367.29 valid /usr/bin/chromium-browser √ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 24 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled None [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled None - COOKIES_FILE - disabled None [i] Data locations: √ OUTPUT_DIR 7 files @ valid /data √ SOURCES_DIR 5057 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 1602 files @ valid ./archive √ CONFIG_FILE 195.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 31.9 MB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem closed this issue 2026-03-01 14:46:57 +03:00
Author
Owner

@pirate commented on GitHub (Jul 10, 2024):

duplicate https://github.com/ArchiveBox/ArchiveBox/issues/864

<!-- gh-comment-id:2221229368 --> @pirate commented on GitHub (Jul 10, 2024): duplicate https://github.com/ArchiveBox/ArchiveBox/issues/864
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#870
No description provided.