[PR #822] [MERGED] Fix Pinboard RSS parsing valid links as None #1277

Closed
opened 2026-03-01 14:49:08 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/822
Author: @overhacked
Created: 8/4/2021
Status: Merged
Merged: 8/4/2021
Merged by: @pirate

Base: devHead: bug_pinboard_rss


📝 Commits (1)

  • f6cf35a Fix Pinboard RSS parsing valid links as None

📊 Changes

1 file changed (+6 additions, -1 deletions)

View changed files

📝 archivebox/parsers/pinboard_rss.py (+6 -1)

📄 Description

Summary

Fixes #821.

item.find(p) returns either an ElementTree.Element or None. The lambda on line 24 coerces the return value to a bool, which is False if the <link> element has no children (see ElementTree.py line 207), so the lambda returns None.

Further, returning a Link with url=None violates an assertion in index/schema.py, which crashes the archivebox add command.

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/822 **Author:** [@overhacked](https://github.com/overhacked) **Created:** 8/4/2021 **Status:** ✅ Merged **Merged:** 8/4/2021 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `bug_pinboard_rss` --- ### 📝 Commits (1) - [`f6cf35a`](https://github.com/ArchiveBox/ArchiveBox/commit/f6cf35a45d41f911e02d275398ef8b6a9efa51a5) Fix Pinboard RSS parsing valid links as `None` ### 📊 Changes **1 file changed** (+6 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/parsers/pinboard_rss.py` (+6 -1) </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary Fixes #821. `item.find(p)` returns either an `ElementTree.Element` or `None`. The [lambda on line 24][lambda] coerces the return value to a bool, which is `False` if the `<link>` element has no children (see [`ElementTree.py` line 207][etbooldef]), so the lambda returns `None`. Further, returning a `Link` with `url=None` violates [an assertion in `index/schema.py`][assertion], which crashes the `archivebox add` command. [lambda]: https://github.com/ArchiveBox/ArchiveBox/blob/3d54b1321bf8c56627aaa50efcc809cd99caee52/archivebox/parsers/pinboard_rss.py#L24 [etbooldef]: https://github.com/python/cpython/blob/3d8993a744813c5144851da5347d7b4b1885f234/Lib/xml/etree/ElementTree.py#L207 [assertion]: https://github.com/ArchiveBox/ArchiveBox/blob/3d54b1321bf8c56627aaa50efcc809cd99caee52/archivebox/index/schema.py#L165 # Changes these areas - [x] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 14:49:08 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1277
No description provided.