[GH-ISSUE #854] Depth=1 not getting right URLs if main domain gets redirect #528

Open
opened 2026-03-01 14:44:20 +03:00 by kerem · 0 comments
Owner

Originally created by @alex9099 on GitHub (Sep 21, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/854

I was trying to archive a website that has a redirect (qsl.net), archivebox gets urls assuming this base url, which are not valid.

In this case adding https://qsl.net gets the right main page (which gets redirected to https://admin.qsl.net/index.php), but the depth=1 links which are on https://admin.qsl.net/ try to get downloaded by https://qsl.net/, which gets 404.

Example https://qsl.net/index.php?r=site/page&view=search should be https://admin.qsl.net/index.php?r=site/page&view=search

Originally created by @alex9099 on GitHub (Sep 21, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/854 I was trying to archive a website that has a redirect (qsl.net), archivebox gets urls assuming this base url, which are not valid. In this case adding https://qsl.net gets the right main page (which gets redirected to https://admin.qsl.net/index.php), but the depth=1 links which are on https://admin.qsl.net/<whatever> try to get downloaded by https://qsl.net/<whatever>, which gets 404. Example https://qsl.net/index.php?r=site/page&view=search should be https://admin.qsl.net/index.php?r=site/page&view=search
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#528
No description provided.