mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #944] Bug: Broken URLs block entire import when using depth=1 #2095
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2095
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @devsorice on GitHub (Mar 14, 2022).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/944
Describe the bug
While using command "archivebox add",
impossible to import a list of hundred, or thousands of good links, if there is just one broken link
Is there a way to ignore broken links?
I understand that devs might have intended this as a feature, however the cli is not usable, if you happen to
have even one link that doesn't work, you cant' import anything.
See also https://github.com/ArchiveBox/ArchiveBox/issues/444
Steps to reproduce
Just put any broken link, or a link to a website that is currently not online in a txt file
Screenshots or log output
Command used
docker-compose run -e ONLY_NEW=true archivebox add --depth=1 --tag=barra-dei-preferiti < barra-dei-preferiti.txt
Output
[!] Failed to download https://www.wunderlist.com/webapp/#/tasks/859529680
HTTPSConnectionPool(host='www.wunderlist.com', port=443): Max retries exceeded with url: /webapp/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0adabe64c0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
ERROR: 1
ArchiveBox version
docker tag :latest
@pirate commented on GitHub (Mar 15, 2022):
This is specific to depth=1, for now I suggest adding in two passes. One with depth=1 and the next with depth=1.
@devsorice commented on GitHub (Mar 15, 2022):
Sorry,
my bad,
i didn't even want to archive links one hop away.
Only the ones in the list inside the txt file .
So without depth=1, if the url doesn't resolve it shouldn't block the import?
@pirate commented on GitHub (Mar 15, 2022):
Correct, it wont block import when using depth=0 @devsorice.
@devsorice commented on GitHub (Mar 15, 2022):
Thank you!
(and sorry for the stupid question btw)
@pirate commented on GitHub (Mar 16, 2022):
I actually want to leave this open to fix the depth=1 blocking issue. Not a stupid question at all it's a valid bug.
@pirate commented on GitHub (May 10, 2022):
Fixed in
8cfe6f4