mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[PR #1370] [MERGED] Add generic_jsonl parser #1392
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1392
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1370
Author: @jimwins
Created: 3/1/2024
Status: ✅ Merged
Merged: 3/15/2024
Merged by: @pirate
Base:
dev← Head:issue-1369📝 Commits (10+)
4e69d2cAddEXTRA_*_ARGSfor wget, curl, and singlefileab8f395AddYOUTUBEDL_EXTRA_ARGS4d9c5a7AddCHROME_EXTRA_ARGS22f9a28Use feedparser for RSS parsing in generic_rss and pinboard_rss parsers68326a6Add cookies file to http request indownload_urlfe11e1ccheck if COOKIE_FILE is file89ab18cAdd generic_jsonl parsera577d1eMerge branch 'dev' into title-cookies-file1f828d9Add tests for generic_rss and pinboard_rss parsers9f462a8Use feedparser for RSS parsing in generic_rss and pinboard_rss parsers📊 Changes
23 files changed (+495 additions, -174 deletions)
View changed files
📝
archivebox/config.py(+47 -23)📝
archivebox/extractors/archive_org.py(+9 -2)📝
archivebox/extractors/favicon.py(+13 -3)📝
archivebox/extractors/headers.py(+9 -3)📝
archivebox/extractors/media.py(+9 -2)📝
archivebox/extractors/mercury.py(+10 -4)📝
archivebox/extractors/singlefile.py(+6 -19)📝
archivebox/extractors/title.py(+9 -2)📝
archivebox/extractors/wget.py(+10 -3)📝
archivebox/parsers/__init__.py(+2 -0)📝
archivebox/parsers/generic_json.py(+57 -53)➕
archivebox/parsers/generic_jsonl.py(+34 -0)📝
archivebox/parsers/generic_rss.py(+20 -28)📝
archivebox/parsers/pinboard_rss.py(+16 -25)📝
archivebox/util.py(+40 -5)📝
bin/test.sh(+1 -1)📝
pyproject.toml(+3 -0)📝
tests/mock_server/server.py(+1 -1)➕
tests/mock_server/templates/example-single.jsonl(+1 -0)➕
tests/mock_server/templates/example.atom(+24 -0)...and 3 more files
📄 Description
Adds a JSONL parser and also fixes the JSON parser to reject what it suspects is a single-line JSONL file.
Changes these areas
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.
mitmproxyintegration out-of-the-box in Docker #2357mitmproxyintegration out-of-the-box in Docker #3867