mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #1568] Bug: archived YouTube videos aren't accessible #2446
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2446
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @arielelkin on GitHub (Oct 25, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1568
Describe the bug
Adding a youtube video apparently archives it but isn't accessible via the UI.
Steps to reproduce
Snapshot shows as "Pending" permanently:

Impossible to access actual archived video. When clicking wget, cookie message won't go away:

Screenshots or log output
Log
ArchiveBox version
version 0.7.2
@pirate commented on GitHub (Oct 25, 2024):
Please post the full output of
archivebox version.@arielelkin commented on GitHub (Oct 28, 2024):
@pirate commented on GitHub (Oct 28, 2024):
Ah ok you're using the old docker image, it's over a year old at this point so
yt-dlpisn't on the latest verison and is likely failing due to changes to Youtube.You're welcome to try the latest BETA
archivebox/archivebox:dev, upgradeyt-dlpwithin the container withaptmanually, or wait for the upcoming v0.9.0 stable release to arrive.@arielelkin commented on GitHub (Oct 29, 2024):
Im using archivebox via pikapod.net, so I've no control over the container and will wait for the next stable release.
Thanks!
@minosimo commented on GitHub (Nov 6, 2024):
I am seeing the same behavior on the dev docker image.
@pirate commented on GitHub (Nov 6, 2024):
@minosimo can you confirm you're not seeing any videos in the
data/archive/<timestamp>/media/folders of snapshots of youtube.com urls? Can you share thedata/archive/<timestamp>/index.jsonfrom one of those captures that you're expecting to see videos in?To be clear: youtube videos are never playable inside the native youtube UI in the captures, they're extracted out as
.mp4files and are visible labelled asMediawith the 📼 icon in the UI and are findable under themedia/folder in the filesystem.@minosimo commented on GitHub (Nov 7, 2024):
Yes, the media folder is empty. I tried with several youtube urls but it looks like the yt-dlp command fails.
index.json
@nguyenmp commented on GitHub (Nov 11, 2024):
I personally ran into issues with
yt-dlpas well and ran the plugin directly to see what the problem was. You can find the fullyt-dlpcommand in your index.json file but you can run it through docker with:I found it I was hitting https://github.com/yt-dlp/yt-dlp/issues/10128 because I was running ArchiveBox on DigitalOcean and YouTube seems to be blocking their whole IP range now. My workaround was to set up a proxy and route yt-dlp traffic through that.
Try running the command directly, you might be hitting the same issue, or maybe something different.
@nguyenmp commented on GitHub (Nov 11, 2024):
Also, might be worth showing the "standard error" when a plugin command fails. It would make debugging a lot easier.
@pirate commented on GitHub (Nov 11, 2024):
We used to show
stdout/stderrwhen an extractor failed but even trying to summarize it was quite noisy, and too many people would open issues because they didn't understand that some errors are inevitable with some URLs.Now we just show the command needed to run to get the full
stdout/stderr, I find it's easer for people to debug that way and many people solve issues on their own when they see the command is wrong / when there is some environment issue.