[GH-ISSUE #591] Bugfix: media links are incorrect when the trailing slash is missing #366

Closed
opened 2026-03-01 14:42:56 +03:00 by kerem · 6 comments
Owner

Originally created by @berezovskyi on GitHub (Dec 24, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/591

Describe the bug

When I navigate to the media folder from /admin/core/snapshot/ as opposed to a single archived item, I am redirected to /media folder and not /media/. The one without a slash generates faulty links.

Steps to reproduce

  1. Go to /admin/core/snapshot/
  2. Pick any item and open media in a new tab:
image
  1. Open the same item via its main link and then open media:
image
  1. Open a link from the "main" media folder. It should work. Eg /archive/1608817286.406686/media/introduction%20to%20using%20fish%20shell-w4C9oswxUM8.webm
  2. Open a link from the "snapshot" index media folder. It should be broken. Eg /archive/1608817286.406686/introduction%20to%20using%20fish%20shell-w4C9oswxUM8.webm

Screenshots or log output

Software versions

  • OS: Ubuntu 20.04.1
  • ArchiveBox version: 0.5.0 docker latest
  • Python version: n/a docker
  • Chrome version: n/a
Originally created by @berezovskyi on GitHub (Dec 24, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/591 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> When I navigate to the media folder from `/admin/core/snapshot/` as opposed to a single archived item, I am redirected to `/media` folder and not `/media/`. The one without a slash generates faulty links. #### Steps to reproduce 1. Go to /admin/core/snapshot/ 2. Pick any item and open media in a new tab: <img width="129" alt="image" src="https://user-images.githubusercontent.com/64734/103093068-883ffc00-45f9-11eb-91dc-0c2e47932c21.png"> 3. Open the same item via its main link and then open media: <img width="409" alt="image" src="https://user-images.githubusercontent.com/64734/103093100-a279da00-45f9-11eb-8177-ab268636ad61.png"> 4. Open a link from the "main" media folder. It should work. Eg `/archive/1608817286.406686/media/introduction%20to%20using%20fish%20shell-w4C9oswxUM8.webm` 5. Open a link from the "snapshot" index media folder. It should be broken. Eg `/archive/1608817286.406686/introduction%20to%20using%20fish%20shell-w4C9oswxUM8.webm` <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> #### Software versions - OS: Ubuntu 20.04.1 - ArchiveBox version: 0.5.0 docker latest - Python version: n/a docker - Chrome version: n/a
kerem 2026-03-01 14:42:56 +03:00
Author
Owner

@berezovskyi commented on GitHub (Dec 24, 2020):

This is my first time looking at the archivebox codebase but I have two possible fix suggestions:

  1. add a slash here after the placeholder github.com/ArchiveBox/ArchiveBox@096749da87/archivebox/index/html.py (L124)
  2. ensure /media/ is prepended at github.com/ArchiveBox/ArchiveBox@096749da87/archivebox/index/html.py (L66) but only if rendering is done from the page without a slash
  3. force redirect /media to /media/

I guess 1st and 3rd options would be easiest and would make most sense. Thank you for the archivebox, Merry Christmas and happy holidays!

<!-- gh-comment-id:750897289 --> @berezovskyi commented on GitHub (Dec 24, 2020): This is my first time looking at the archivebox codebase but I have two possible fix suggestions: 1. add a slash here after the placeholder https://github.com/ArchiveBox/ArchiveBox/blob/096749da8759e4435dfb049b4273783b5c2eb3f6/archivebox/index/html.py#L124 2. ensure `/media/` is prepended at https://github.com/ArchiveBox/ArchiveBox/blob/096749da8759e4435dfb049b4273783b5c2eb3f6/archivebox/index/html.py#L66 but only if rendering is done from the page without a slash 3. force redirect /media to /media/ I guess 1st and 3rd options would be easiest and would make most sense. Thank you for the archivebox, Merry Christmas and happy holidays!
Author
Owner

@cdvv7788 commented on GitHub (Dec 24, 2020):

This is related to this: https://github.com/ArchiveBox/ArchiveBox/issues/487
I will give it a check next week.

<!-- gh-comment-id:750914909 --> @cdvv7788 commented on GitHub (Dec 24, 2020): This is related to this: https://github.com/ArchiveBox/ArchiveBox/issues/487 I will give it a check next week.
Author
Owner

@rpdillon commented on GitHub (Jan 22, 2021):

Just picked up ArchiveBox for the first time this past week, and I noticed the media links from the main index don't have a trailing slash, so the relative links in the directory all return a 404. I thought the extraction was broken, but as I started digging, I discovered it's just an issue of the trailing slash, so now I'm manually adding the trailing slash to access media links.

I took a look at the code, at it looks like the trailing slash could be added in schema.py:429, but I haven't yet vetted where else that value might be used, so I'm not sure if that's the correct fix.

<!-- gh-comment-id:765679743 --> @rpdillon commented on GitHub (Jan 22, 2021): Just picked up ArchiveBox for the first time this past week, and I noticed the media links from the main index don't have a trailing slash, so the relative links in the directory all return a 404. I thought the extraction was broken, but as I started digging, I discovered it's just an issue of the trailing slash, so now I'm manually adding the trailing slash to access media links. I took a look at the code, at it looks like the trailing slash could be added in schema.py:429, but I haven't yet vetted where else that value might be used, so I'm not sure if that's the correct fix.
Author
Owner

@pirate commented on GitHub (Jan 25, 2021):

@rpdillon are you referring to the static main index ./data/index.html, the django public main index /public, the admin main index /admin/core/snapshot/? The trailing slash behavior differs between them for an important reasons, adding a trailing slash in the code might fix one index but subtly break the other 2/3 cases. It's already been added and removed several times because of the differences between the methods, and doing so has often broken things in the past.

<!-- gh-comment-id:767023334 --> @pirate commented on GitHub (Jan 25, 2021): @rpdillon are you referring to the static main index `./data/index.html`, the django public main index `/public`, the admin main index `/admin/core/snapshot/`? The trailing slash behavior differs between them for an important reasons, adding a trailing slash in the code might fix one index but subtly break the other 2/3 cases. It's already been added and removed several times because of the differences between the methods, and doing so has often broken things in the past.
Author
Owner

@rpdillon commented on GitHub (Jan 26, 2021):

Heya @pirate! Thanks for alerting me to these three cases! Only one has the trailing slash in my instance running version 0.5.3:

  • ✔️ ./data/index.html
  • /public
  • /admin/core/snapshot

So your suspicion seems correct: if we change schema.py it'll break one to fix the others. I'm happy to take a look over the next couple of days in the evenings to see if I can put up a PR to address this, I'll just have to set up a dev env for the project, so I thought I'd check here first in case it was an easier fix.

<!-- gh-comment-id:767221666 --> @rpdillon commented on GitHub (Jan 26, 2021): Heya @pirate! Thanks for alerting me to these three cases! Only one has the trailing slash in my instance running version 0.5.3: - ✔️ `./data/index.html` - ❌ `/public` - ❌ `/admin/core/snapshot` So your suspicion seems correct: if we change schema.py it'll break one to fix the others. I'm happy to take a look over the next couple of days in the evenings to see if I can put up a PR to address this, I'll just have to set up a dev env for the project, so I thought I'd check here first in case it was an easier fix.
Author
Owner

@pirate commented on GitHub (Jan 30, 2021):

I think I fixed this in v0.5.4, give it a try:

docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev
docker run -v $PWD:/data archivebox:dev ...
``
<!-- gh-comment-id:770171928 --> @pirate commented on GitHub (Jan 30, 2021): I think I fixed this in v0.5.4, give it a try: ```bash docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev docker run -v $PWD:/data archivebox:dev ... ``
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#366
No description provided.