mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[GH-ISSUE #1337] Feature Request: use cache control headers to determine if content has changed since last snapshot #2327
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#2327
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Juliaria08 on GitHub (Jan 28, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1337
Type
What is the problem that your feature request solves
I'd like to request ArchiveBox send the
If-Modified-Sinceif it has already fetched the website previously and the website sent theLast-Modifiedheader. Or sendIf-None-Matchfrom the stored value of theETagresponse, such that feeds like Rachel Kroll's feed can easily be fetched without having to wait a full day.This would also make long fetching of sites easier on both our host and the remote's host.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Sorry, I think I should've read the entire thing before. It should send
If-Modified-Sinceif it has already fetched the website previously, using the value the server sent on theLast-Modified. Or it should send aIf-None-Matchfrom the value ofETagif found.I don't know if sending both is allowed, but I guess it'd be acceptable to prefer
If-Modified-Sinceif both are.What hacks or alternative solutions have you tried to solve the problem?
I've considered putting a HTTP proxy that would store those tags, and have archivebox be in the middle, but that doesn't look pretty.
How badly do you want this new feature?
I don't really mind too much, but I'd appreciate it being there, as archivebox could cause strain on servers, and thus we might get blocked from being able to archive things if we archive too deep.
I'm a fairly "new" systems admin, and I haven't set ArchiveBox up in a public enviroment, it is only running on my laptop, but I could easily set it up as I have already set up some other Django based apps to a system. But I don't have time to do things.