mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #611] Question: Adding any options to [ARCHIVE_METHOD_OPTIONS] causes wget to fail #3398
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3398
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @winteriscariot on GitHub (Jan 10, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/611
If I add anything to the [ARCHIVE_METHOD_OPTIONS] section it causes wget to throw an exception. Here is my current ArchiveBox.conf:
If I just have one option (such as COOKIES_FILE) it still fails. It does NOT fail if I remove the COOKIES_FILE and WGET_USER_AGENT from the ArchiveBox.conf. With the above config I get the follow exception thrown, about halfway through the wget process:
I'm running on an up-to-date Arch Linux install (updated this morning to try and fix it) and I installed archivebox via pip, and is version 5.3.
wget version (just the default from the arch repos):
Unfortunately I'm not familiar enough with python to debug myself, or even have a good idea if this is a bug in archivebox or a config or dependency issue. is there something obvious that I should be looking at for this?
Any help would be awesome, thanks!
@winteriscariot commented on GitHub (Jan 10, 2021):
Here's stdout of archiving that same link with both ARCHIVE_METHOD_OPTIONS removed:
functional ArchiveBox.conf:
Successful wget stdout:
Unfortunately I need my cookies.txt file for archiving some logged in sites, and at least one site throws a 403 whenever I connect with the default ArchiveBox user agent, so not using those options isn't a great choice.
I labeled this as a question mostly cuz I'm not sure if this is a bug or a config issue
@winteriscariot commented on GitHub (Jan 10, 2021):
Just confirmed that the same behavior occurs with the same config on a different Arch machine. Are my options maybe malformed?
@pirate commented on GitHub (Jan 11, 2021):
Strange, I haven't seen this failure mode before. Can you try removing the quotes from your User Agent config line?
@winteriscariot commented on GitHub (Jan 12, 2021):
It actually may be my cookies file that was causing this issue, not the wget useragent. I left the useragent and removed the cookie file and it works.
I'll have to verify that the cookies text file is formatted correctly. Closing, thanks!