mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #971] Bug: Parsing Wallabag RSS feed fails #3623
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3623
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @peterrus on GitHub (May 1, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/971
Describe the bug
I have a setup where (through a cronjob) Archivebox fetches a RSS feed of my archived (aka read) articles from Wallabag.it and imports them. This way I have a redundant archive of everything I read in Wallabag. Overkill? maybe.
Somewhere around 2022-03-31 the parsing of this RSS feed started to fail with the following error:
Not long before 2022-03-31 Wallabag has released a new version: https://github.com/wallabag/wallabag/releases/tag/2.4.3 which includes a PR that modifies the formatting of the RSS feed it provides: https://github.com/wallabag/wallabag/pull/5347. I suspect this to be the culprit.
I am not exactly sure where the responsibility of fixing this lies but I want to at least document that I ran into this in case someone else experiences a similar issue.
Steps to reproduce
I have created a test account with one archived article on Wallabag.it. This account will expire in 14 days, but you can easily create a new one for testing purposes.
curl https://app.wallabag.it/feed/dokafad/TDzxV9ejsZiWMq/archive | archivebox add --parser=wallabag_atomScreenshots or log output
Full log
ArchiveBox version
@peterrus commented on GitHub (May 3, 2022):
Also created a dedicated issue in the Wallabag project, see above.
@pirate commented on GitHub (May 10, 2022):
I fixed it, it's a mildly annoying change in their export format where they started inserting newline wrappings mid-XML tag which broke my janky parser.
acd53c8@peterrus commented on GitHub (May 10, 2022):
hey @pirate, thanks for this! One problem though :p I am getting some output that worries me:
After streaming a whole bunch of these errors the actual archival process does start, and does seem to work.
I am running the docker image with tag
sha-eb77908btw.@pirate commented on GitHub (Sep 5, 2023):
Whoops reopened by accident, ignore that, tracking the latest wallabag issue over here: #1000