[PR #669] [MERGED] add command: --parser option (fixes #235) #4273

Closed
opened 2026-03-15 01:35:48 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/669
Author: @FliegendeWurst
Created: 3/20/2021
Status: Merged
Merged: 3/31/2021
Merged by: @pirate

Base: devHead: fix-issue-235


📝 Commits (2)

📊 Changes

6 files changed (+73 additions, -21 deletions)

View changed files

📝 archivebox/cli/archivebox_add.py (+9 -0)
📝 archivebox/index/__init__.py (+2 -2)
📝 archivebox/main.py (+2 -1)
📝 archivebox/parsers/__init__.py (+30 -17)
📝 archivebox/parsers/generic_txt.py (+1 -1)
archivebox/parsers/url_list.py (+29 -0)

📄 Description

Summary

This PR adds an additional "input format" option to archivebox add. When set to a value other than the default, only that format will be parsed. If the url-list format is specified, each non-empty line in input files is simply added as a URL.

Related issues

#235

Stuff not yet done/determined

  • should the other parsers should also be available using this option?
  • documentation updates (especially the wiki section "Import a list of URLs from a text file")

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/669 **Author:** [@FliegendeWurst](https://github.com/FliegendeWurst) **Created:** 3/20/2021 **Status:** ✅ Merged **Merged:** 3/31/2021 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `fix-issue-235` --- ### 📝 Commits (2) - [`60bd9a9`](https://github.com/ArchiveBox/ArchiveBox/commit/60bd9a902e20359bfe94aae6ff66f036d360fbb2) add command: --parser option - [`2656e59`](https://github.com/ArchiveBox/ArchiveBox/commit/2656e59215e0f94892a79e8f94cd90b8717fe8d6) change list style ### 📊 Changes **6 files changed** (+73 additions, -21 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/cli/archivebox_add.py` (+9 -0) 📝 `archivebox/index/__init__.py` (+2 -2) 📝 `archivebox/main.py` (+2 -1) 📝 `archivebox/parsers/__init__.py` (+30 -17) 📝 `archivebox/parsers/generic_txt.py` (+1 -1) ➕ `archivebox/parsers/url_list.py` (+29 -0) </details> ### 📄 Description # Summary This PR adds an additional "input format" option to `archivebox add`. When set to a value other than the default, only that format will be parsed. If the `url-list` format is specified, each non-empty line in input files is simply added as a URL. # Related issues #235 # Stuff not yet done/determined - [ ] should the other parsers should also be available using this option? - [ ] documentation updates (especially the wiki section "Import a list of URLs from a text file") # Changes these areas - [ ] Bugfixes - [x] Feature behavior - [x] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:35:48 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4273
No description provided.