[GH-ISSUE #797] Enhancement: Change log message during parsing to show it's counting number of lines in input file not number of URLs #503

Open
opened 2026-03-01 14:44:10 +03:00 by kerem · 0 comments
Owner

Originally created by @philippemilink on GitHub (Jul 18, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/797

Hello,

I find the log messages of importing links from a JSON file a bit confusing. If I want to add JSON file with the following structure (as produced by Python's json.dumps(); notice the total number of lines compared to the number of links !):

[
    {
        "url": "http://foo",
        "tags": "foo"
    },
    {
        "url": "https://bar",
        "tags": "bar"
    },
    {
        "url": "https://foobar",
        "tags": "foo,bar"
    },
    ...
]

ArchiveBox says:

% archivebox add --parser json < links.json
[+] [2021-07-18 13:07:34] Adding 1263 links to index (crawl depth=0)...                                              
    > Saved verbatim input to sources/1626613654-import.txt
    > Parsed 315 URLs from input (Generic JSON)
...

The number 1263 in Adding 1263 links to index is not the number of links, but the number of lines in the JSON file. The number in this message should be 315, the same as in Parsed 315 URLs from input (or the message could be e.g. Parsing 1263 lines).

The first time I saw this message, I was wondering if ArchiveBox has a limit number of links it can process in a single archivebox add !

Originally created by @philippemilink on GitHub (Jul 18, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/797 Hello, I find the log messages of importing links from a JSON file a bit confusing. If I want to add JSON file with the following structure (as produced by Python's `json.dumps()`; notice the total number of lines compared to the number of links !): ```json [ { "url": "http://foo", "tags": "foo" }, { "url": "https://bar", "tags": "bar" }, { "url": "https://foobar", "tags": "foo,bar" }, ... ] ``` ArchiveBox says: ``` % archivebox add --parser json < links.json [+] [2021-07-18 13:07:34] Adding 1263 links to index (crawl depth=0)... > Saved verbatim input to sources/1626613654-import.txt > Parsed 315 URLs from input (Generic JSON) ... ``` The number **1263** in *Adding 1263 links to index* is not the number of links, but the number of lines in the JSON file. The number in this message should be **315**, the same as in *Parsed 315 URLs from input* (or the message could be e.g. *Parsing 1263 lines*). The first time I saw this message, I was wondering if ArchiveBox has a limit number of links it can process in a single `archivebox add` !
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#503
No description provided.