[GH-ISSUE #244] Behavior: Use relative paths in index.json metadata to avoid leaking full filesystem layout information #1680

Open
opened 2026-03-01 17:52:48 +03:00 by kerem · 4 comments
Owner

Originally created by @Syd on GitHub (May 26, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/244

Archivebox has quite a bit of potential for researchers, activists, journalists etc however it currently leaks quite a bit of information about directory structures, usernames and more unnecessarily.

an example from index.json

"/home/{username}/ArchiveBox/output/sources/stdin-1558782247.txt"

There is no real reason to give away directory structure and usernames in public archives and some users are likely to be targeted by those unhappy at what is being archived both by knowing webserver layout and the environment of the archiver itself.

Originally created by @Syd on GitHub (May 26, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/244 Archivebox has quite a bit of potential for researchers, activists, journalists etc however it currently leaks quite a bit of information about directory structures, usernames and more unnecessarily. an example from index.json `"/home/{username}/ArchiveBox/output/sources/stdin-1558782247.txt"` There is no real reason to give away directory structure and usernames in public archives and some users are likely to be targeted by those unhappy at what is being archived both by knowing webserver layout and the environment of the archiver itself.
Author
Owner

@pirate commented on GitHub (May 27, 2019):

I considered this originally when designing it, but decided against because I had a use case where I was archiving bookmarks from multiple users and wanted the paths like this:

  • /home/someuser/Downloads/bookmarks.html
  • /home/someotheruser/Downloads/bookmarks.html

Maybe it could be a config option, or I can just do relative paths by default and let people deal with that issue themselves.

<!-- gh-comment-id:496251173 --> @pirate commented on GitHub (May 27, 2019): I considered this originally when designing it, but decided against because I had a use case where I was archiving bookmarks from multiple users and wanted the paths like this: - `/home/someuser/Downloads/bookmarks.html` - `/home/someotheruser/Downloads/bookmarks.html` Maybe it could be a config option, or I can just do relative paths by default and let people deal with that issue themselves.
Author
Owner

@cdvv7788 commented on GitHub (Oct 20, 2020):

With the change to the sqlite database is this still an issue? The index.json is not generated automatically, and the sql index does not have absolute paths anywhere. I can double check, but I don't think the web UI has the issue either.

<!-- gh-comment-id:712813356 --> @cdvv7788 commented on GitHub (Oct 20, 2020): With the change to the sqlite database is this still an issue? The index.json is not generated automatically, and the sql index does not have absolute paths anywhere. I can double check, but I don't think the web UI has the issue either.
Author
Owner

@pirate commented on GitHub (Oct 22, 2020):

I believe absolute paths are still used in the archive/<timestamp>/index.json files, but I could be wrong.

<!-- gh-comment-id:714687311 --> @pirate commented on GitHub (Oct 22, 2020): I believe absolute paths are still used in the `archive/<timestamp>/index.json` files, but I could be wrong.
Author
Owner

@cdvv7788 commented on GitHub (Oct 22, 2020):

Good point. I will double check.

<!-- gh-comment-id:714695722 --> @cdvv7788 commented on GitHub (Oct 22, 2020): Good point. I will double check.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1680
No description provided.