mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #210] Can't write wget snapshots of URL with query-string #1653
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1653
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bltavares on GitHub (Apr 5, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/210
Describe the bug
Most of Linux filesystems allow to use
?and other special characters as part of folder and filenames. But if you run this using ExFAT as the storage, it will throw the following error:ExFAT are common filesystems for portable USB drives, specially if there is the intention to use it with Linux, Windows and MacOS. It is not the best archival filesystem (eg: zfs), but it is the most portable filesystem.
I'm not sure how to do that yet, but it would be nice to convert special characters into safer ones when writing it down to the disk.
Steps to reproduce
cdinto itecho 'www.youtube.com/watch?v=BkW1xQgrSPQ' | ./archiveSoftware versions
296aa767078f@pirate commented on GitHub (Apr 11, 2019):
Hmm interesting, looks like we'll have to switch from
--restrict-file-names=unixto--restrict-file-names=windowsnow that we support Windows and Windows file systems. When set towindows, wget's filename escaping is a superset of the escaping thatunixdoes, so it'll continue to work on existing systems but make all newly archived files ExFAT/NTFS compatible.https://www.gnu.org/software/wget/manual/wget.html#targetText=--restrict-file-names=modes
@pirate commented on GitHub (Apr 11, 2019):
Done
github.com/pirate/ArchiveBox@4f599c0b0b@bltavares commented on GitHub (Apr 11, 2019):
Thank you! 🙇
@bltavares commented on GitHub (Apr 12, 2019):
@pirate I've tried the latest Docker image published, which should include the fix, but it still reports as Unix on the error log.
Looking at the code, it might not be enough to only add the flag, as it might need to adjust the
wget_output_pathmight need to change as well.@bltavares commented on GitHub (Apr 12, 2019):
I've noticed that the Dockerfile clones the latest commit on Github, instead of copying the project with their changes into the image. This could cause some confusion on why a change is not being built, as it is being cached by Docker automated build layers, and given the text don't change it will not update the image.
I'll send a PR soon (unless its intentional) :)