[GH-ISSUE #870] Bug: ArchiveBox add for Wallabag Atom feed doesn't work #2046

Closed
opened 2026-03-01 17:56:02 +03:00 by kerem · 3 comments
Owner

Originally created by @m0nhawk on GitHub (Oct 2, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/870

Describe the bug

ArchiveBox add for Wallabag Atom feed doesn't work.

Initially noticed that schedule doesn't work, and figured out that it's because Wallabag Atom feed doesn't work.

Steps to reproduce

Run archivebox add:

archivebox add --parser=wallabag_atom --depth=1 https://wallabag.../feed/user/token/all

Screenshots or log output

[i] [2021-10-02 05:59:25] ArchiveBox v0.6.2: archivebox add --parser=wallabag_atom --depth=1 https://wallabag.../feed/user/token/all
    > /data

[+] [2021-10-02 05:59:26] Adding 1 links to index (crawl depth=1)...
    > Saved verbatim input to sources/1633154366-import.txt

[X] No links found using Wallabag Atom parser
    Hint: Try a different parser or double check the input?

    > Parsed 0 URLs from input (Wallabag Atom)
    > Found 0 new URLs not already in index

[*] [2021-10-02 05:59:26] Writing 0 links to main index...
    √ ./index.sqlite3

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-5.11.0-36-generic-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=sonic

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl
 -  WGET_BINARY           -               disabled  /usr/bin/wget
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor
 -  MERCURY_BINARY        -               disabled  /node/node_modules/@postlight/mercury-parser/cli.js
 -  GIT_BINARY            -               disabled  /usr/bin/git
 -  YOUTUBEDL_BINARY      -               disabled  /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled
 -  COOKIES_FILE          -               disabled

[i] Data locations:
 √  OUTPUT_DIR            9 files         valid     /data
 √  SOURCES_DIR           35 files        valid     ./sources
 √  LOGS_DIR              2 files         valid     ./logs
 √  ARCHIVE_DIR           102 files       valid     ./archive
 √  CONFIG_FILE           420.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             1.1 MB          valid     ./index.sqlite3
Originally created by @m0nhawk on GitHub (Oct 2, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/870 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you. (the only required section is the version information) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> ArchiveBox `add` for Wallabag Atom feed doesn't work. Initially noticed that `schedule` doesn't work, and figured out that it's because Wallabag Atom feed doesn't work. #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> Run `archivebox add`: ``` archivebox add --parser=wallabag_atom --depth=1 https://wallabag.../feed/user/token/all ``` #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> ``` [i] [2021-10-02 05:59:25] ArchiveBox v0.6.2: archivebox add --parser=wallabag_atom --depth=1 https://wallabag.../feed/user/token/all > /data [+] [2021-10-02 05:59:26] Adding 1 links to index (crawl depth=1)... > Saved verbatim input to sources/1633154366-import.txt [X] No links found using Wallabag Atom parser Hint: Try a different parser or double check the input? > Parsed 0 URLs from input (Wallabag Atom) > Found 0 new URLs not already in index [*] [2021-10-02 05:59:26] Writing 0 links to main index... √ ./index.sqlite3 ``` #### ArchiveBox version <!-- Run the `archivebox version` command locally then copy paste the result here: --> ```logs ArchiveBox v0.6.2 Cpython Linux Linux-5.11.0-36-generic-x86_64-with-glibc2.28 x86_64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=sonic [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl - WGET_BINARY - disabled /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor - MERCURY_BINARY - disabled /node/node_modules/@postlight/mercury-parser/cli.js - GIT_BINARY - disabled /usr/bin/git - YOUTUBEDL_BINARY - disabled /usr/local/bin/youtube-dl √ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 9 files valid /data √ SOURCES_DIR 35 files valid ./sources √ LOGS_DIR 2 files valid ./logs √ ARCHIVE_DIR 102 files valid ./archive √ CONFIG_FILE 420.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 1.1 MB valid ./index.sqlite3 ``` <!-- Tickets without full version info will closed until it is provided, we need the full output here to help you solve your issue -->
kerem closed this issue 2026-03-01 17:56:03 +03:00
Author
Owner

@mflis commented on GitHub (Jan 11, 2022):

Add command won't download feed file for you. Here's example of intended usage: https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving#example-import-an-rss-feed-from-pocket-every-12-hours

<!-- gh-comment-id:1010201346 --> @mflis commented on GitHub (Jan 11, 2022): Add command won't download feed file for you. Here's example of intended usage: https://github.com/ArchiveBox/ArchiveBox/wiki/Scheduled-Archiving#example-import-an-rss-feed-from-pocket-every-12-hours
Author
Owner

@pirate commented on GitHub (Jan 11, 2022):

Can you post a redacted excerpt of the Atom feed you're trying to add and I can take a look later.

<!-- gh-comment-id:1010436406 --> @pirate commented on GitHub (Jan 11, 2022): Can you post a redacted excerpt of the Atom feed you're trying to add and I can take a look later.
Author
Owner

@pirate commented on GitHub (Mar 27, 2024):

This should work now with feedparser in the latest :dev versions, comment back if you're still having troubles and I can reopen it.

<!-- gh-comment-id:2021705673 --> @pirate commented on GitHub (Mar 27, 2024): This should work now with `feedparser` in the latest `:dev` versions, comment back if you're still having troubles and I can reopen it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2046
No description provided.