mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #778] Feature Request: add BDfR as a new extractor for archiving Reddit content #3513
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3513
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pirate on GitHub (Jul 2, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/778
Discussed in https://github.com/ArchiveBox/ArchiveBox/discussions/754
Originally posted by BlipRanger May 24, 2021
Just wanted to make a quick mention of BDfR as a cool project that might make for a good starting point for the unrolling of reddit comments/posts as mentioned in the roadmap. They currently support grabbing a variety of media types from the post as well as the comments/text in a separate (json) file. I've been working on an addon for it lately and I think it's a pretty great project with well-maintained code. If nothing else, they have really good examples of working with reddit data which could be useful! Just wanted to bring that to your attention!
I'd love to add BDfR as an extractor for Reddit content (and something similar for Twitter too https://github.com/ArchiveBox/ArchiveBox/issues/345) but am somewhat swamped with work and travel for the near future.
If you @BlipRanger or anyone else wants to add it as an extractor (matching the style of our other extractors, e.g.
archivebox/extractors/media.pyis a great example to copy), I'd be happy to review PRs!We have some good instructions for contributing a new extractor and getting started with ArchiveBox development in general:
@pirate commented on GitHub (Oct 20, 2023):
We use Mercury (recently renamed
postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side:@rmelotte commented on GitHub (Jun 24, 2024):
It looks like the postlight project has no recent activity unfortunately (no PR reviews at least)...
Is there any plan to replace it with something else, or integrate the existing Reddit and HN PRs in a different way?