[GH-ISSUE #287] Incorrect handling of parentheses in url #205

Closed
opened 2026-03-01 14:41:31 +03:00 by kerem · 1 comment
Owner

Originally created by @ghost on GitHub (Oct 18, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/287

Describe the bug

Attempt to archive https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase) will archive https://en.m.wikipedia.org/wiki/A_good_day_to_die_ instead.

Steps to reproduce

Run `echo 'https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase)' | /bin/archive'

Screenshots or log output

echo "https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase)" | sudo docker run -i --rm -v ~/ArchiveBox:/data:rw,z nikisweeting/archivebox
[*] [2019-10-18 03:09:37] Parsing new links from output/sources/stdin-1571368177.txt...
    > Adding 1 new links to index (parsed import as Plain Text)
[*] [2019-10-18 03:09:37] Saving main index files...
    √ /data/index.json
    √ /data/index.html
[▶] [2019-10-18 03:09:37] Updating content for 1 pages in archive...

[+] [2019-10-18 03:09:37] "https://en.m.wikipedia.org/wiki/A_good_day_to_die_"
    https://en.m.wikipedia.org/wiki/A_good_day_to_die_
    > /data/archive/1571368177
      > title
      > favicon
      > wget
      > pdf
      > screenshot
      > dom
      > media
      > archive_org
[√] [2019-10-18 03:09:59] Update of 1 pages complete (22.14 sec)
    - 0 links skipped
    - 1 links updated
    - 0 links had errors
    To view your archive, open: /data/index.html
[*] [2019-10-18 03:09:59] Saving main index files...
    √ /data/index.json
    √ /data/index.html

Software versions

  • OS: ([e.g. macOS 10.14] the operating system you're running ArchiveBox on)
  • ArchiveBox version: (git rev-parse HEAD | head -c7 [e.g. d798117] commit ID of the version you're running)
  • Python version: (python3 --version [e.g. 3.7.0])
  • Chrome version: (chromium-browser --version [e.g. 73.1.2.3] if relevant to bug)
Originally created by @ghost on GitHub (Oct 18, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/287 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> Attempt to archive `https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase)` will archive `https://en.m.wikipedia.org/wiki/A_good_day_to_die_` instead. #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> Run `echo 'https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase)' | /bin/archive' #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> ``` echo "https://en.m.wikipedia.org/wiki/A_good_day_to_die_(phrase)" | sudo docker run -i --rm -v ~/ArchiveBox:/data:rw,z nikisweeting/archivebox [*] [2019-10-18 03:09:37] Parsing new links from output/sources/stdin-1571368177.txt... > Adding 1 new links to index (parsed import as Plain Text) [*] [2019-10-18 03:09:37] Saving main index files... √ /data/index.json √ /data/index.html [▶] [2019-10-18 03:09:37] Updating content for 1 pages in archive... [+] [2019-10-18 03:09:37] "https://en.m.wikipedia.org/wiki/A_good_day_to_die_" https://en.m.wikipedia.org/wiki/A_good_day_to_die_ > /data/archive/1571368177 > title > favicon > wget > pdf > screenshot > dom > media > archive_org [√] [2019-10-18 03:09:59] Update of 1 pages complete (22.14 sec) - 0 links skipped - 1 links updated - 0 links had errors To view your archive, open: /data/index.html [*] [2019-10-18 03:09:59] Saving main index files... √ /data/index.json √ /data/index.html ``` #### Software versions - OS: ([e.g. macOS 10.14] the operating system you're running ArchiveBox on) - ArchiveBox version: (`git rev-parse HEAD | head -c7` [e.g. d798117] commit ID of the version you're running) - Python version: (`python3 --version` [e.g. 3.7.0]) - Chrome version: (`chromium-browser --version` [e.g. 73.1.2.3] if relevant to bug)
kerem closed this issue 2026-03-01 14:41:31 +03:00
Author
Owner

@pirate commented on GitHub (Oct 18, 2019):

Thanks for reporting! We're aware of this issue and it's trickier than it looks on the surface, for now you can import your links as JSON, XML, RSS, etc to avoid this problem (let me know if you need help with doing that).

Duplicate: https://github.com/pirate/ArchiveBox/issues/235

<!-- gh-comment-id:543498221 --> @pirate commented on GitHub (Oct 18, 2019): Thanks for reporting! We're aware of this issue and it's trickier than it looks on the surface, for now you can import your links as JSON, XML, RSS, etc to avoid this problem (let me know if you need help with doing that). Duplicate: https://github.com/pirate/ArchiveBox/issues/235
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#205
No description provided.