[GH-ISSUE #990] Feature Request: Add MERCURY_ARGS extractor option to enable saving article text as .md markdown #3636

Open
opened 2026-03-14 23:50:23 +03:00 by kerem · 1 comment
Owner

Originally created by @cmuc24 on GitHub (Jun 16, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/990

As a admin, i need a markdown format, similar to the existing readability format,
to catch .md items for further, external downstream processes

the ideal specific solution

As a user, i can define MD as (additional) format for my (to be archived) items,
to get these items in our publishing worklow and as a result,
back to our commonly used MD based knowledge base.

Good for Applications like MkDocs, Trilium or similar other MD based Applications.

actual solution

At the moment a MD Webclipper as browser extension do the job, with some additional options also.
https://github.com/deathau/markdownload (MIT and Apache2 licenses only)

how important

as i can (not clear) see, a twin from the readability extractor, paired with great stuff, e.g. from mentioned .md Webclipper, could have potential for a low hanging fruit!? :-)

edit: if primary focus on converting, then an info.json could solve it. (like yt-download doing)
low code performer like me can work better with lightweight formats :)

personally i'll say its an improvement that can increase (not only) effiecenty in downstream processes

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up (i´ll describe it outside this feature request)
Originally created by @cmuc24 on GitHub (Jun 16, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/990 ## Feature Request, a short description related to extractors, result items As a admin, i need a markdown format, similar to the existing readability format, to catch .md items for further, external downstream processes ## the ideal specific solution As a user, i can define MD as (additional) format for my (to be archived) items, to get these items in our publishing worklow and as a result, back to our commonly used MD based knowledge base. Good for Applications like MkDocs, Trilium or similar other MD based Applications. ## actual solution At the moment a MD Webclipper as browser extension do the job, with some additional options also. https://github.com/deathau/markdownload (MIT and Apache2 licenses only) ## how important as i can (not clear) see, a twin from the readability extractor, paired with great stuff, e.g. from mentioned .md Webclipper, could have potential for a low hanging fruit!? :-) edit: if primary focus on converting, then an info.json could solve it. (like yt-download doing) low code performer like me can work better with lightweight formats :) **personally** i'll say its an improvement that can increase (not only) effiecenty in downstream processes - [x] I'm willing to contribute [dev time](https://github.com/ArchiveBox/ArchiveBox#archivebox-development) / [money](https://github.com/sponsors/pirate) to fix this issue - [x] I like ArchiveBox so far / would recommend it to a friend - [x] I've had a lot of difficulty getting ArchiveBox set up (i´ll describe it outside this feature request)
Author
Owner

@ntevenhere commented on GitHub (Sep 12, 2022):

The mercury extractor has --format=markdown. In archivebox the option is set to --format=text. You just need a way to change the argument.

(MERCURY_ARGS config option when? 😄 (I could work on that actually)

<!-- gh-comment-id:1244516969 --> @ntevenhere commented on GitHub (Sep 12, 2022): The mercury extractor has [--format=markdown](https://github.com/postlight/parser/issues/316). In archivebox the option is set to [--format=text](https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/extractors/mercury.py#L68). You just need a way to change the argument. (`MERCURY_ARGS` config option when? 😄 (I could work on that actually)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3636
No description provided.