mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #136] Lots of 403 Forbiddens #3113
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3113
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @sbrl on GitHub (Jan 31, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/136
Describe the bug
I'm getting a bunch of random 403 forbiddens. The curious thing though is that the resulting archive output looks to be complete.
This still happens if I tweak archivebox to invoke
wgetwith-e robots=off, and if I go 'superstealth' - see the script below.Here's an example URL: https://www.destroyallsoftware.com/talks/wat
Here's the pair of scripts I'm using to archive things:
archive-custom
archive-url
Steps to reproduce
Steps to reproduce the behavior:
Screenshots or log output
If applicable, use screenshots or copy/pasted terminal output to help explain your problem.
Software versions (please complete the following information):
45c499dc9e@pirate commented on GitHub (Feb 1, 2019):
That's actually fairly normal, often one or two resources on a page will be blocked for various reasons, it doesn't actually prevent the rest of the page from downloading.
Generally, if it says
FINISHEDwith some megabytes downloaded in the output then it ran successfully.@sbrl commented on GitHub (Feb 2, 2019):
Ah ok! Thanks for the clarification :-)