[GH-ISSUE #245] adding txt file with multiple links throws JSONDecodeError #1681

Closed
opened 2026-03-01 17:52:48 +03:00 by kerem · 3 comments
Owner

Originally created by @Strubbl on GitHub (May 26, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/245

Describe the bug

Archiving a list of links does not work anymore. The links.txt consists of many URLs, one URL per line. The file is just a link per line and no JSON at all.

I try to add multiple links according to documentation: https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin

Screenshots or log output

$ cat /data/archivebox/links.txt | docker run -i -v /data/archivebox/output:/data strubbl_archivebox /bin/archive
Traceback (most recent call last):
  File "/bin/archive", line 136, in <module>
    main(*sys.argv)
  File "/bin/archive", line 98, in main
    update_archive_data(import_path=import_path, resume=resume)
  File "/bin/archive", line 106, in update_archive_data
    all_links, new_links = load_links_index(out_dir=OUTPUT_DIR, import_path=import_path)
  File "/home/pptruser/app/archivebox/index.py", line 61, in load_links_index
    existing_links = parse_json_links_index(out_dir)
  File "/home/pptruser/app/archivebox/index.py", line 108, in parse_json_links_index
    links = json.load(f)['links']
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Software versions

  • OS: Docker container
  • ArchiveBox version: b109dd6
Originally created by @Strubbl on GitHub (May 26, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/245 #### Describe the bug Archiving a list of links does not work anymore. The links.txt consists of many URLs, one URL per line. The file is just a link per line and no JSON at all. I try to add multiple links according to documentation: https://github.com/pirate/ArchiveBox/wiki/Usage#import-a-single-url-or-list-of-urls-via-stdin #### Screenshots or log output ``` $ cat /data/archivebox/links.txt | docker run -i -v /data/archivebox/output:/data strubbl_archivebox /bin/archive Traceback (most recent call last): File "/bin/archive", line 136, in <module> main(*sys.argv) File "/bin/archive", line 98, in main update_archive_data(import_path=import_path, resume=resume) File "/bin/archive", line 106, in update_archive_data all_links, new_links = load_links_index(out_dir=OUTPUT_DIR, import_path=import_path) File "/home/pptruser/app/archivebox/index.py", line 61, in load_links_index existing_links = parse_json_links_index(out_dir) File "/home/pptruser/app/archivebox/index.py", line 108, in parse_json_links_index links = json.load(f)['links'] File "/usr/lib/python3.5/json/__init__.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.5/json/__init__.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ``` #### Software versions - OS: Docker container - ArchiveBox version: b109dd6
Author
Owner

@pirate commented on GitHub (May 27, 2019):

This looks like a corrupted index instead of a parsing error, can you post your redacted index.json file?

<!-- gh-comment-id:496251620 --> @pirate commented on GitHub (May 27, 2019): This looks like a corrupted index instead of a parsing error, can you post your redacted `index.json` file?
Author
Owner

@Strubbl commented on GitHub (May 27, 2019):

You are right. My index.json has the size of 0 bytes. I did not check that.

How can i recover from this situation? Deleting the json file?

<!-- gh-comment-id:496284329 --> @Strubbl commented on GitHub (May 27, 2019): You are right. My index.json has the size of 0 bytes. I did not check that. How can i recover from this situation? Deleting the json file?
Author
Owner

@pirate commented on GitHub (May 31, 2019):

I thnik you hit this bug, you can follow the instructions on that ticket: https://github.com/pirate/ArchiveBox/issues/234

<!-- gh-comment-id:497815913 --> @pirate commented on GitHub (May 31, 2019): I thnik you hit this bug, you can follow the instructions on that ticket: https://github.com/pirate/ArchiveBox/issues/234
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1681
No description provided.