mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #345] Feature Request: Twitter Thread Archiver #247
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#247
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @shimizurei on GitHub (May 31, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/345
Can something like the Thread Reader App be incorporated into ArchiveBox?
Type
What is the problem that your feature request solves
We can save Twitter threads (NOT individual Twitter posts) as functionally complete articles.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
A nice article pdf like the Thread Reader app.
What hacks or alternative solutions have you tried to solve the problem?
ThreadReader App
How badly do you want this new feature?
@pirate commented on GitHub (Jun 1, 2020):
Yeah I've wanted this for a long time too. The way it's been implemented on other projects is as a content script that unrolls threads before snapshotting inside of chrome headless.
@mAAdhaTTah commented on GitHub (Dec 13, 2020):
Would it be possible for the archiver to trigger the ThreadReader app to unroll it then archive the ThreadReader result?
@shimizurei commented on GitHub (Dec 14, 2020):
Then it ends up depending on ThreadReader. What if ThreadReader becomes defunct tomorrow?
@mAAdhaTTah commented on GitHub (Dec 14, 2020):
@shimizurei You'd have an archive of the ThreadReader page in your ArchiveBox.
@shimizurei commented on GitHub (Dec 14, 2020):
If it's part of ArchiveBox's code, then it's life depends on the maintainers of ArchiveBox. ThreadReader isn't open source, so if it goes down tomorrow, that's it. Everyone will be scrambling to find a replacement because the code is not easily available. Yes, you'll have your already created archives, but you wouldn't be able to create anymore.
@pirate commented on GitHub (Dec 14, 2020):
I'd rather do this via a python library, CLI tool, or puppeteer scripts (once our async playwright worker system is out).
Follow here for updates on puppeteer script support progress: https://github.com/ArchiveBox/ArchiveBox/issues/51
@akmadian commented on GitHub (Nov 22, 2021):
I would really like this feature, and I'm willing to contribute code to make it happen, if that's welcome.
@pirate commented on GitHub (Nov 23, 2021):
There are still a lot of structural blockers in Archivebox's design to running content scripts directly during archiving.
The most helpful approach might be to write a dedicated extractor in Python that dumps the unrolled thread to a nicer HTML file? Look for existing tools structured like YouTube-dl but for Reddit and Twitter (does a
thread-dlexist?), and then clone the YOUTUBEDL extractor code to get started.@jpaulickcz commented on GitHub (Jan 4, 2022):
I've been looking for a box with this functionality for a long while now, with no luck. The closest thing to what I imagine and that I found is https://github.com/weskerfoot/TweetLog – however that does require access to developer API which I don't have.
Regular thread – sequence of tweets making a mini article (my god, what happened to good ol' blogs?) – can be otherwise quite easily archived with Thread Reader App (by calling
https://threadreaderapp.com/thread/$TWIDENT.htmlwhere$TWIDENTis ID of any of the tweet thats part of the thread; and then downloading it a few minutes later. Although I am looking for something that would be able to archive a tweet OR a thread, including all of the replies to one or more of the tweets included in said thread.@onemenzel commented on GitHub (Apr 27, 2022):
ThreadReaderApp has been acquired by twitter and shut down. I think a feasible approach would be to make a config option where a twitter developer token can be entered and then just download the thread and put it into a simple html file with one ˋ<p>ˋaragraph tag per tweet, maybe ˋ<br>ˋ for newlines.
I myself would do it quick and dirty and just pretend the html was made by readability but I can understand if that’s too much of a hack to you 😃
I also think that this feature is now of a higher importance than before because of the acquisition. I just archived ThreadReaderApps links before.
@pirate commented on GitHub (May 3, 2022):
How about Nitter?
@pirate commented on GitHub (Oct 20, 2023):
FYI we use Mercury (recently renamed
postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side: