[GH-ISSUE #1369] Feature Request: Add new generic_jsonl parser to support ingesting JSONL #838

Closed
opened 2026-03-01 14:46:40 +03:00 by kerem · 3 comments
Owner

Originally created by @jimwins on GitHub (Mar 1, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1369

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

JSONL is not supported.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

This should be a fairly simple addition to the generic_json parser. When the file fails to parse with json.parse(), try again to parse it as JSONL before trying the case that skips the first line.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @jimwins on GitHub (Mar 1, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1369 ## Type - [ ] General question or discussion - [X] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves JSONL is not supported. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes This should be a fairly simple addition to the `generic_json` parser. When the file fails to parse with `json.parse()`, try again to parse it as JSONL before trying the case that skips the first line. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [X] It would be nice to have eventually --- - [X] I'm willing to contribute [dev time](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) / [money](https://github.com/sponsors/pirate) to fix this issue - [X] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@pirate commented on GitHub (Mar 1, 2024):

Maybe we can do it as a separate parser? generic_jsonl

I think making the parsers more narrow and explicit and having more of them is likely a better approach going forward to avoid the issues we've had in the past with trying to cram a bunch of workaround behaviors into a single parser.

<!-- gh-comment-id:1972271832 --> @pirate commented on GitHub (Mar 1, 2024): Maybe we can do it as a separate parser? `generic_jsonl` I think making the parsers more narrow and explicit and having more of them is likely a better approach going forward to avoid the issues we've had in the past with trying to cram a bunch of workaround behaviors into a single parser.
Author
Owner

@jimwins commented on GitHub (Mar 1, 2024):

Yeah, now that I play around with it we do need it to be a distinct parser because a single-line JSONL is a valid JSON file but not in the format that the generic_json parser expects. The two parsers can share code for turning each JSON object into a Link so that doesn't get duplicated, at least.

<!-- gh-comment-id:1972287993 --> @jimwins commented on GitHub (Mar 1, 2024): Yeah, now that I play around with it we do need it to be a distinct parser because a single-line JSONL is a valid JSON file but not in the format that the `generic_json` parser expects. The two parsers can share code for turning each JSON object into a `Link` so that doesn't get duplicated, at least.
Author
Owner

@pirate commented on GitHub (Mar 22, 2024):

Closing as completed, thanks @jimwins!

<!-- gh-comment-id:2014380196 --> @pirate commented on GitHub (Mar 22, 2024): Closing as completed, thanks @jimwins!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#838
No description provided.