mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #761] COOKIES_FILE isn't used when fetching page titles, leading to saving captcha-page titles like "Before you continue to YouTube..." #3501
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3501
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @dansbandit on GitHub (Jun 5, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/761
Describe the bug
Title becomes 'Before you continue to YouTube' instead of video title due to youtube redirects to a cookie consent form. This could be solved if you could add a cookie file to the curl command that is run.
Steps to reproduce
archivebox add https://www.youtube.com/watch?v=aP8sRCun63MScreenshots or log output
N/A
ArchiveBox version
@SoraMakes commented on GitHub (Jun 11, 2021):
ArchiveBox provides a way to add a cockies file. I use the docker image and there i added the following environment variable for that: COOKIES_FILE=/data/cookies.txt
I think it is only used for media and wget.
@dansbandit commented on GitHub (Jun 15, 2021):
Yes I've tried that environment variable and it seems that it doesn't affect the title.
@pirate commented on GitHub (Jun 18, 2021):
Unfortunately the cookies file does not apply to the title, so there's no easy way to get around this right now till we push a fix to use the cookies in
download_url()(seearchivebox/extractors/title.py).You'll have to edit the titles manually in the Admin to fix them, or try and stay under the rate limits that Youtube uses so that you're not throttled and getting captcha pages. You can always click
Pull Titlein the Admin UI to force re-fetching the title.@dansbandit commented on GitHub (Jun 21, 2021):
If I recall correctly the cookie consent form affects all European user regardless of rate limits.
In the meantime I will try to get the titles another way.
@JoshMock commented on GitHub (Feb 5, 2023):
Would this still be a good first ticket? Looking to start making some contributions to the project, but want to get familiar with the codebase first.