[GH-ISSUE #345] Feature Request: Twitter Thread Archiver #1759

Open
opened 2026-03-01 17:53:25 +03:00 by kerem · 12 comments
Owner

Originally created by @shimizurei on GitHub (May 31, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/345

Can something like the Thread Reader App be incorporated into ArchiveBox?

Type

  • Propose a brand new feature

What is the problem that your feature request solves

We can save Twitter threads (NOT individual Twitter posts) as functionally complete articles.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

A nice article pdf like the Thread Reader app.

What hacks or alternative solutions have you tried to solve the problem?

ThreadReader App

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
Originally created by @shimizurei on GitHub (May 31, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/345 Can something like the Thread Reader App be incorporated into ArchiveBox? ## Type - [x] Propose a brand new feature ## What is the problem that your feature request solves We can save Twitter threads (NOT individual Twitter posts) as functionally complete articles. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes A nice article pdf like the Thread Reader app. ## What hacks or alternative solutions have you tried to solve the problem? [ThreadReader App](https://threadreaderapp.com/help) ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I can't live without it - [ ] It's important to add it in the near-mid term future - [x] It would be nice to have eventually --- - [ ] I'm willing to contribute dev time / money to fix this issue - [x] I like ArchiveBox so far / would recommend it to a friend - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@pirate commented on GitHub (Jun 1, 2020):

Yeah I've wanted this for a long time too. The way it's been implemented on other projects is as a content script that unrolls threads before snapshotting inside of chrome headless.

<!-- gh-comment-id:636929199 --> @pirate commented on GitHub (Jun 1, 2020): Yeah I've wanted this for a long time too. The way it's been implemented on other projects is as a content script that unrolls threads before snapshotting inside of chrome headless.
Author
Owner

@mAAdhaTTah commented on GitHub (Dec 13, 2020):

Would it be possible for the archiver to trigger the ThreadReader app to unroll it then archive the ThreadReader result?

<!-- gh-comment-id:744085878 --> @mAAdhaTTah commented on GitHub (Dec 13, 2020): Would it be possible for the archiver to trigger the ThreadReader app to unroll it then archive the ThreadReader result?
Author
Owner

@shimizurei commented on GitHub (Dec 14, 2020):

Then it ends up depending on ThreadReader. What if ThreadReader becomes defunct tomorrow?

<!-- gh-comment-id:744506578 --> @shimizurei commented on GitHub (Dec 14, 2020): Then it ends up depending on ThreadReader. What if ThreadReader becomes defunct tomorrow?
Author
Owner

@mAAdhaTTah commented on GitHub (Dec 14, 2020):

@shimizurei You'd have an archive of the ThreadReader page in your ArchiveBox.

<!-- gh-comment-id:744511409 --> @mAAdhaTTah commented on GitHub (Dec 14, 2020): @shimizurei You'd have an archive of the ThreadReader page in your ArchiveBox.
Author
Owner

@shimizurei commented on GitHub (Dec 14, 2020):

If it's part of ArchiveBox's code, then it's life depends on the maintainers of ArchiveBox. ThreadReader isn't open source, so if it goes down tomorrow, that's it. Everyone will be scrambling to find a replacement because the code is not easily available. Yes, you'll have your already created archives, but you wouldn't be able to create anymore.

<!-- gh-comment-id:744522651 --> @shimizurei commented on GitHub (Dec 14, 2020): If it's part of ArchiveBox's code, then it's life depends on the maintainers of ArchiveBox. ThreadReader isn't open source, so if it goes down tomorrow, that's it. Everyone will be scrambling to find a replacement because the code is not easily available. Yes, you'll have your already created archives, but you wouldn't be able to create anymore.
Author
Owner

@pirate commented on GitHub (Dec 14, 2020):

I'd rather do this via a python library, CLI tool, or puppeteer scripts (once our async playwright worker system is out).

Follow here for updates on puppeteer script support progress: https://github.com/ArchiveBox/ArchiveBox/issues/51

<!-- gh-comment-id:744618066 --> @pirate commented on GitHub (Dec 14, 2020): I'd rather do this via a python library, CLI tool, or puppeteer scripts (once our async playwright worker system is out). Follow here for updates on puppeteer script support progress: https://github.com/ArchiveBox/ArchiveBox/issues/51
Author
Owner

@akmadian commented on GitHub (Nov 22, 2021):

I would really like this feature, and I'm willing to contribute code to make it happen, if that's welcome.

<!-- gh-comment-id:974989659 --> @akmadian commented on GitHub (Nov 22, 2021): I would really like this feature, and I'm willing to contribute code to make it happen, if that's welcome.
Author
Owner

@pirate commented on GitHub (Nov 23, 2021):

There are still a lot of structural blockers in Archivebox's design to running content scripts directly during archiving.

The most helpful approach might be to write a dedicated extractor in Python that dumps the unrolled thread to a nicer HTML file? Look for existing tools structured like YouTube-dl but for Reddit and Twitter (does a thread-dl exist?), and then clone the YOUTUBEDL extractor code to get started.

<!-- gh-comment-id:976126365 --> @pirate commented on GitHub (Nov 23, 2021): There are still a lot of structural blockers in Archivebox's design to running content scripts directly during archiving. The most helpful approach might be to write a dedicated extractor in Python that dumps the unrolled thread to a nicer HTML file? Look for existing tools structured like YouTube-dl but for Reddit and Twitter (does a `thread-dl` exist?), and then clone the YOUTUBEDL extractor code to get started.
Author
Owner

@jpaulickcz commented on GitHub (Jan 4, 2022):

I've been looking for a box with this functionality for a long while now, with no luck. The closest thing to what I imagine and that I found is https://github.com/weskerfoot/TweetLog – however that does require access to developer API which I don't have.

Regular thread – sequence of tweets making a mini article (my god, what happened to good ol' blogs?) – can be otherwise quite easily archived with Thread Reader App (by calling https://threadreaderapp.com/thread/$TWIDENT.html where $TWIDENT is ID of any of the tweet thats part of the thread; and then downloading it a few minutes later. Although I am looking for something that would be able to archive a tweet OR a thread, including all of the replies to one or more of the tweets included in said thread.

<!-- gh-comment-id:1004858684 --> @jpaulickcz commented on GitHub (Jan 4, 2022): I've been looking for a box with this functionality for a long while now, with no luck. The closest thing to what I imagine and that I found is https://github.com/weskerfoot/TweetLog – however that does require access to developer API which I don't have. Regular thread – sequence of tweets making a mini article (my god, what happened to good ol' blogs?) – can be otherwise quite easily archived with [Thread Reader App](https://threadreaderapp.com/) (by calling `https://threadreaderapp.com/thread/$TWIDENT.html` where `$TWIDENT` is ID of any of the tweet thats part of the thread; and then downloading it a few minutes later. Although I am looking for something that would be able to archive a tweet OR a thread, including all of the replies to one or more of the tweets included in said thread.
Author
Owner

@onemenzel commented on GitHub (Apr 27, 2022):

ThreadReaderApp has been acquired by twitter and shut down. I think a feasible approach would be to make a config option where a twitter developer token can be entered and then just download the thread and put it into a simple html file with one ˋ<p>ˋaragraph tag per tweet, maybe ˋ<br>ˋ for newlines.

I myself would do it quick and dirty and just pretend the html was made by readability but I can understand if that’s too much of a hack to you 😃

I also think that this feature is now of a higher importance than before because of the acquisition. I just archived ThreadReaderApps links before.

<!-- gh-comment-id:1111407561 --> @onemenzel commented on GitHub (Apr 27, 2022): ThreadReaderApp has been acquired by twitter and shut down. I think a feasible approach would be to make a config option where a twitter developer token can be entered and then just download the thread and put it into a simple html file with one ˋ&lt;p&gt;ˋaragraph tag per tweet, maybe ˋ&lt;br&gt;ˋ for newlines. I myself would do it quick and dirty and just pretend the html was made by readability but I can understand if that’s too much of a hack to you 😃 I also think that this feature is now of a higher importance than before because of the acquisition. I just archived ThreadReaderApps links before.
Author
Owner

@pirate commented on GitHub (May 3, 2022):

How about Nitter?

https://twitter.com/ArchiveBoxApp -> https://nitter.net/ArchiveBoxApp
https://twitter.com/mitchellh/status/1615797167607939072 -> https://nitter.net/mitchellh/status/1615797167607939072
... etc
<!-- gh-comment-id:1115556285 --> @pirate commented on GitHub (May 3, 2022): How about Nitter? ``` https://twitter.com/ArchiveBoxApp -> https://nitter.net/ArchiveBoxApp https://twitter.com/mitchellh/status/1615797167607939072 -> https://nitter.net/mitchellh/status/1615797167607939072 ... etc ```
Author
Owner

@pirate commented on GitHub (Oct 20, 2023):

FYI we use Mercury (recently renamed postlight) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side:

<!-- gh-comment-id:1773336782 --> @pirate commented on GitHub (Oct 20, 2023): FYI we use Mercury (recently renamed `postlight`) as an extractor already, and they're rapidly adding extractors on their side for many different kinds of sites, so we should get these improvements with no effort required on the archivebox side: - Reddit threads: https://github.com/postlight/parser/pull/746 - HN threads: https://github.com/postlight/parser/pull/745 - Twitter threads: https://github.com/postlight/parser/pull/622
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1759
No description provided.