[GH-ISSUE #237] Architecture: Strip all Javascript from static html archives by default #3184

Closed
opened 2026-03-14 21:29:16 +03:00 by kerem · 2 comments
Owner

Originally created by @noirscape on GitHub (May 9, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/237

Type

  • General Question or Disussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

Some websites use javascript to redirect any saved pages to the original site, thereby beaking archiving of pages on the site in question.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Ideally, the option to scan the javascript in each downloaded file to prevent setting window.location in any form.

Since JS can be obfuscated in all sorts of forms, perphaps an option to simply strip out javascript from downloaded files could also be useful slash more reasonable to implement.

What hacks or alternative solutions have you tried to solve the problem?

Currently, the only real solution is to open up the offending HTML files myself and remove the javascript causing the redirects from the <script> tags.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I cant live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute to development / fixing this issue
  • I like ArchiveBox so far / would recommend it to a friend
Originally created by @noirscape on GitHub (May 9, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/237 ## Type - [ ] General Question or Disussion - [x] Propose a brand new feature - [ ] Request modification of existing behavior or design ## What is the problem that your feature request solves Some websites use javascript to redirect any saved pages to the original site, thereby beaking archiving of pages on the site in question. ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes Ideally, the option to scan the javascript in each downloaded file to prevent setting `window.location` in any form. Since JS can be obfuscated in all sorts of forms, perphaps an option to simply strip out javascript from downloaded files could also be useful slash more reasonable to implement. ## What hacks or alternative solutions have you tried to solve the problem? Currently, the only real solution is to open up the offending HTML files myself and remove the javascript causing the redirects from the \<script> tags. ## How badly do you want this new feature? - [ ] It's an urgent deal-breaker, I cant live without it - [x] It's important to add it in the near-mid term future - [ ] It would be nice to have eventually --- - [ ] I'm willing to contribute to development / fixing this issue - [x] I like ArchiveBox so far / would recommend it to a friend
Author
Owner

@pirate commented on GitHub (May 18, 2019):

Yeah we're definitely adding this soon, it's a huge security issue currently to allow archived pages to run JS in a shared context, especially when opened via the filesystem.

I've officially made this a blocker to v0.4 due to the urgency: https://github.com/pirate/ArchiveBox/pull/207#issuecomment-494107553 but I cant promise I'll get around to it soon. In the meantime I'm adding notices to the README and wikis telling people not to use it for private content and to beware of potential JS execution reading from the filesystem / XSS-ing the archive.

This issue has evolved over time and can be tracked here now: https://github.com/ArchiveBox/ArchiveBox/issues/239

<!-- gh-comment-id:493636454 --> @pirate commented on GitHub (May 18, 2019): ~~Yeah we're definitely adding this soon, it's a huge security issue currently to allow archived pages to run JS in a shared context, especially when opened via the filesystem.~~ ~~I've officially made this a blocker to v0.4 due to the urgency: https://github.com/pirate/ArchiveBox/pull/207#issuecomment-494107553 but I cant promise I'll get around to it soon. In the meantime I'm adding notices to the README and wikis telling people not to use it for private content and to beware of potential JS execution reading from the filesystem / XSS-ing the archive.~~ This issue has evolved over time and can be tracked here now: https://github.com/ArchiveBox/ArchiveBox/issues/239
Author
Owner

@pirate commented on GitHub (Jan 20, 2024):

Our new solution moving forward is likely going to involve serving untrusted JS from a different port + adding csp/cors/etc. headers, as JS sanitizing /stripping is inherently fraught with security risk and doesn't provide the best user experience.

Follow here for updates: https://github.com/ArchiveBox/ArchiveBox/issues/239

<!-- gh-comment-id:1901577877 --> @pirate commented on GitHub (Jan 20, 2024): Our new solution moving forward is likely going to involve serving untrusted JS from a different port + adding csp/cors/etc. headers, as JS sanitizing /stripping is inherently fraught with security risk and doesn't provide the best user experience. Follow here for updates: https://github.com/ArchiveBox/ArchiveBox/issues/239
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3184
No description provided.