mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1086] Bug: Not archiving Twitter correctly #2189
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2189
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @m-primo on GitHub (Jan 19, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1086
Describe the bug
No screenshot, single file, and output.html are saved.
And not the tweet itself "Hmm...this page doesn’t exist. Try searching for something else.".
Check the screenshot
Steps to reproduce
Even in your own demo instance it doesn't work!
Screenshots or log output
ArchiveBox version
@m-primo commented on GitHub (Jan 19, 2023):
btw, I tried to save tweets with headless chromium and i got the same result.
@pirate commented on GitHub (Jan 21, 2023):
Yup, you should archive the equivalent Nitter URLs (or use another alternative frontend instead of twitter). Twitter has always been very broken. This is also true for Reddit -> Teddit, Instagram -> Bibliogram, and a couple other big companies that implement advanced bot-detection and blocking, see a longer list of alternative front-ends here: https://hackmd.io/MCpUlTbLThyF6cw_fywT_g?view. It's not ideal but it's better than not having any solution.
Follow here for updates: https://github.com/ArchiveBox/ArchiveBox/issues/345
@m-primo commented on GitHub (Jan 22, 2023):
That's what I thought at first, but I opened an issue so if anyone can help or find out any solution, because I've tried many archiving solutions, and some work arounds, ig the only one worked was
pywb. But thanks, I'll take a look at the link in your reply.@pirate commented on GitHub (Jan 22, 2023):
Yeah if you're doing a lot of twitter/fb/insta/etc. archiving I highly recommend https://github.com/webrecorder/browsertrix-crawler, it uses the same engine as pywb and is written by the same team.
Check out their whole suite here: https://webrecorder.net/
@m-primo commented on GitHub (Jan 23, 2023):
Okay, thank you so much.