mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #158] Archive Method: wget has issues when archiving gamestar.de #3130
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3130
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Powerbless on GitHub (Mar 3, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/158
when i try to Archive https://gamestar.de the archived website(not screenshot or pdf) looks like an very old html site. when i use wget lokal to download the site with my wget command, everything works fine. Does ArchiveBox use the "--execute robots=off" Flag?
my wget command that works:
"C:\wget\wget.exe" --mirror -c --recursive --level 1 --timestamping --page-requisites --html-extension --convert-links --execute robots=off --directory-prefix=.\gamestar\ --span-hosts --domains=gamestar.de,www.gamestar.de https://www.gamestar.de
@pirate commented on GitHub (Mar 3, 2019):
This is the
wgetcommand that ArchiveBox uses, and we do ignore robots exclusions using the-e robots=offflag.We don't do
--mirroror--level 1though, maybe you can test your command with those removed and the ArchiveBox one with those added. I can also experiment adding those flags and seeing how it affectswgetbehavior on other sites.@pirate commented on GitHub (Jul 24, 2020):
Please give this a try on the latest
djangobranch (which contains the latest wget, youtubedl, curl, etc versions), if you're still seeing issues comment back here and I'll reopen the ticket.