[PR #1424] [MERGED] Path validation fixes for wget_output_path() #1421

Closed
opened 2026-03-01 14:49:43 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1424
Author: @pirate
Created: 5/7/2024
Status: Merged
Merged: 5/7/2024
Merged by: @pirate

Base: devHead: path-validation-fixes


📝 Commits (2)

  • f62cb5f change wget to use stricter ascii filepath normalization
  • 9b21ce4 add workaround logic to catch paths that are too long or contain unprintable characters

📊 Changes

2 files changed (+94 additions, -39 deletions)

View changed files

📝 archivebox/config.py (+1 -1)
📝 archivebox/extractors/wget.py (+93 -38)

📄 Description

Fixes https://github.com/ArchiveBox/ArchiveBox/issues/549
Fixes https://github.com/ArchiveBox/ArchiveBox/issues/1373

If you want a laugh read the blood-sweat-and-tears docstring I added 😅


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1424 **Author:** [@pirate](https://github.com/pirate) **Created:** 5/7/2024 **Status:** ✅ Merged **Merged:** 5/7/2024 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `path-validation-fixes` --- ### 📝 Commits (2) - [`f62cb5f`](https://github.com/ArchiveBox/ArchiveBox/commit/f62cb5fb43972f1ddf94560ec500c1318a73c9e6) change wget to use stricter ascii filepath normalization - [`9b21ce4`](https://github.com/ArchiveBox/ArchiveBox/commit/9b21ce490ec6be2a40d22658da3ff1579fff8fc0) add workaround logic to catch paths that are too long or contain unprintable characters ### 📊 Changes **2 files changed** (+94 additions, -39 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/config.py` (+1 -1) 📝 `archivebox/extractors/wget.py` (+93 -38) </details> ### 📄 Description Fixes https://github.com/ArchiveBox/ArchiveBox/issues/549 Fixes https://github.com/ArchiveBox/ArchiveBox/issues/1373 If you want a laugh read the blood-sweat-and-tears docstring I added 😅 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 14:49:43 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1421
No description provided.