mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #335] Archived sites fail to load resources with subresource integrity checks because of wget URL rewriting #1752
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1752
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @v3rmine on GitHub (Mar 28, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/335
Describe the bug
While archiving the github page https://github.com/phsym/prettytable-rs or any gist page / github page, the JS / CSS doesn't load, only the HTML so the webpage appear broken.
Steps to reproduce
I'm running ArchiveBox version
c79ce2b1fwith this config :I cannot load the Local Archive,There is only the HTML that load correctly and the console.output show this error :
Screenshots or log output
Software versions
83197ef@pirate commented on GitHub (Mar 31, 2020):
Yeah subresource-integrity checks break archiving because wget rewrites URLs in source files to be relative. This is a known issue that's difficult to fix, so I recommend relying on the PDF and screenshot output more than the wget output.
Alternatively, you can try running
find ./ -name "*.html" -type f -exec sed -E -i '' 's/integrity="sha.*"|crossorigin="anonymous"//g' {} \;in your archive folder to remove the subresource integrity checks in the archived HTML files.See: