mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[PR #1678] [CLOSED] Adding MAX_URL_ATTEMPTS to stop retrying failed URLs #4473
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#4473
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1678
Author: @warenhaus
Created: 4/24/2025
Status: ❌ Closed
Base:
dev← Head:dev📝 Commits (8)
20c86a7Update common.py818aea4Create 0075_add_max_url_retries.py2eadf3cUpdate models.pyb726198Update ArchiveBox.conf.defaultb647b12Update init.pyd00bffbUpdate init.pye9a8bbfUpdate common.py912eba6Update ArchiveBox.conf.default📊 Changes
5 files changed (+115 additions, -80 deletions)
View changed files
📝
archivebox/config/common.py(+2 -0)➕
archivebox/core/migrations/0075_add_max_url_retries.py(+16 -0)📝
archivebox/core/models.py(+1 -0)📝
archivebox/extractors/__init__.py(+95 -80)📝
etc/ArchiveBox.conf.default(+1 -0)📄 Description
Summary
Adding MAX_URL_ATTEMPTS to stop retrying failed URLs as a configuration option (default 0, meaning unlimited) and a retry_count column in the database to track the number of attempts. Once a snapshot has been retried MAX_URL_ATTEMPTS times, it will be skipped on further updates.
MAX_URL_ATTEMPTS = 0 means unlimited retries, which is ArchiveBox's behaviour before this change.
The change does not take into account any date info, as was suggested here.
The changes are only a couple of lines of logic, it looks like more due to changed indenting.
Related issues
#109
Changes these areas
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.