mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #1518] Bug: wget fails on https://user:pass@domain/ URLs using HTTP basic auth #3920
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3920
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @agowa on GitHub (Sep 22, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1518
Describe the bug
archivebox update shows
but when doing the steps to see the full output it just works. The file it tried to download was a PDF file from an webserver with http basic auth protection and the credentials being embedded into the URL.
Steps to reproduce
I just installed ArchiveBox using the docker-compose steps in the readme and when trying to capture a site and then view it, ArchiveBox first shows a page that says
Then I basically ran that command (
docker compose run archivebox update -t timestamp 1727017909.005329) and got this error:but when I then want to see the full output and run these three commands I only get this:
And even if I check the exit code using
echo $?afterwards it only returns that it was successful. Same for running it with--verboseinstead of--no-verbose.Edit: It also works when I run the provided command directly without an interactive tty attached. So that's not the issue here I think. Tested using
docker run --entrypoint "" --workdir "/data/archive/1727017909.005329" -v $PWD/data:/data archivebox/archivebox wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --timeout=60 --restrict-file-names=windows --warc-file=/data/archive/1727017909.005329/warc/1727018544 --page-requisites "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/) wget/GNU Wget 1.21.3" --compression=auto "https://user:pass@domain/+some/sub/dirs/filename.pdf")Screenshots or log output
ArchiveBox version
@pirate commented on GitHub (Sep 22, 2024):
Can you confirm the error is repeatable if you retry it with
#1added to the end of the URL?@agowa commented on GitHub (Sep 22, 2024):
Yes it is also with the same URL but with a postfixed
#1I get the same error.@pirate commented on GitHub (Sep 22, 2024):
Ok thanks, last question: can you try it with the latest
archivebox/archivebox:dev, 0.7.2 is quite old at this point and it mightve already been fixed by one of the hundreds of changes since then (in particular wget version upgrades and CLI argument requoting logic improvements).@agowa commented on GitHub (Sep 22, 2024):
sorry, but archivebox/archivebox:dev doesn't work for me at all. At first tried by replacing the
:latestin the docker-compose file with:devand after you edited your post to add the quicktest commands I also tried them. But the init fails and I only get this error:@pirate commented on GitHub (Sep 22, 2024):
Ah sorry looks like the build I started last night before I went to bed never finished, give me a sec I'll fix it.
@agowa commented on GitHub (Oct 3, 2024):
@pirate any update on this one? Were you able to replicate the issue?
@pirate commented on GitHub (Oct 3, 2024):
Build is fixed, but I'm not sure if the original issue is, you can give it a try:
@agowa commented on GitHub (Oct 4, 2024):
Hi, sorry but th dev image still doesn't work as you suggest. When I try to run it in a new and completely empty folder the init fails:
@agowa commented on GitHub (Oct 4, 2024):
I moved the issues with spinning up the dev image into a separate issue as I found a workaround by first doing the init using the latest image.
Regarding this issue you're right it still exists. the "add" command still claims "Wget failed or got an error from the server" but the also provided "full output" command succeeds.