[GH-ISSUE #944] Bug: Broken URLs block entire import when using depth=1 #2095

Closed
opened 2026-03-01 17:56:25 +03:00 by kerem · 6 comments
Owner

Originally created by @devsorice on GitHub (Mar 14, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/944

Describe the bug

While using command "archivebox add",
impossible to import a list of hundred, or thousands of good links, if there is just one broken link
Is there a way to ignore broken links?
I understand that devs might have intended this as a feature, however the cli is not usable, if you happen to
have even one link that doesn't work, you cant' import anything.
See also https://github.com/ArchiveBox/ArchiveBox/issues/444

Steps to reproduce

Just put any broken link, or a link to a website that is currently not online in a txt file

Screenshots or log output

Command used
docker-compose run -e ONLY_NEW=true archivebox add --depth=1 --tag=barra-dei-preferiti < barra-dei-preferiti.txt

Output
[!] Failed to download https://www.wunderlist.com/webapp/#/tasks/859529680

HTTPSConnectionPool(host='www.wunderlist.com', port=443): Max retries exceeded with url: /webapp/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0adabe64c0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
ERROR: 1

ArchiveBox version

docker tag :latest

Originally created by @devsorice on GitHub (Mar 14, 2022). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/944 #### Describe the bug While using command "archivebox add", impossible to import a list of hundred, or thousands of good links, if there is just one broken link Is there a way to ignore broken links? I understand that devs might have intended this as a feature, however the cli is not usable, if you happen to have even one link that doesn't work, you cant' import anything. See also https://github.com/ArchiveBox/ArchiveBox/issues/444 #### Steps to reproduce Just put any broken link, or a link to a website that is currently not online in a txt file #### Screenshots or log output Command used docker-compose run -e ONLY_NEW=true archivebox add --depth=1 --tag=barra-dei-preferiti < barra-dei-preferiti.txt Output [!] Failed to download https://www.wunderlist.com/webapp/#/tasks/859529680 HTTPSConnectionPool(host='www.wunderlist.com', port=443): Max retries exceeded with url: /webapp/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0adabe64c0>: Failed to establish a new connection: [Errno -2] Name or service not known')) ERROR: 1 #### ArchiveBox version docker tag :latest
kerem closed this issue 2026-03-01 17:56:26 +03:00
Author
Owner

@pirate commented on GitHub (Mar 15, 2022):

This is specific to depth=1, for now I suggest adding in two passes. One with depth=1 and the next with depth=1.

<!-- gh-comment-id:1067558624 --> @pirate commented on GitHub (Mar 15, 2022): This is specific to depth=1, for now I suggest adding in two passes. One with depth=1 and the next with depth=1.
Author
Owner

@devsorice commented on GitHub (Mar 15, 2022):

Sorry,
my bad,
i didn't even want to archive links one hop away.
Only the ones in the list inside the txt file .
So without depth=1, if the url doesn't resolve it shouldn't block the import?

<!-- gh-comment-id:1067705758 --> @devsorice commented on GitHub (Mar 15, 2022): Sorry, my bad, i didn't even want to archive links one hop away. Only the ones in the list inside the txt file . So without depth=1, if the url doesn't resolve it shouldn't block the import?
Author
Owner

@pirate commented on GitHub (Mar 15, 2022):

Correct, it wont block import when using depth=0 @devsorice.

<!-- gh-comment-id:1068481847 --> @pirate commented on GitHub (Mar 15, 2022): Correct, it wont block import when using depth=0 @devsorice.
Author
Owner

@devsorice commented on GitHub (Mar 15, 2022):

Thank you!
(and sorry for the stupid question btw)

<!-- gh-comment-id:1068500056 --> @devsorice commented on GitHub (Mar 15, 2022): Thank you! (and sorry for the stupid question btw)
Author
Owner

@pirate commented on GitHub (Mar 16, 2022):

I actually want to leave this open to fix the depth=1 blocking issue. Not a stupid question at all it's a valid bug.

<!-- gh-comment-id:1069585110 --> @pirate commented on GitHub (Mar 16, 2022): I actually want to leave this open to fix the depth=1 blocking issue. Not a stupid question at all it's a valid bug.
Author
Owner

@pirate commented on GitHub (May 10, 2022):

Fixed in 8cfe6f4

<!-- gh-comment-id:1121842237 --> @pirate commented on GitHub (May 10, 2022): Fixed in 8cfe6f4
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2095
No description provided.