mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-26 01:26:00 +03:00
[GH-ISSUE #119] History entry timestamps aren't accurate #81
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#81
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kergoth on GitHub (Dec 3, 2018).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/119
Firefox uses PRTime, Chrome uses webkit timestamps, neither of which match up as is with bookmark-archiver timestamp expectations. Firefox's timestamps need to be multiplied by 10, otherwise this year's history entries show up as 1974, and chrome's timestamps are in microseconds from 1601. To work around, use
(last_visit_time-11644446702000000)*10rather thanlast_visit_timefor chrome, andlast_visit_date*10rather thanlast_visit_datefor firefox. I'm also testing addition of safari history export, but the dates require further massaging than the other two, as they're Mac Absolute Time and in<seconds from 2001>.<microseconds>form, just multiplying to eliminate the decimal doesn't work as the microseconds lack leading zero padding.For reference, see:
@pirate commented on GitHub (Dec 4, 2018):
Thanks for pointing this out.
Timestamps seem to be fundamentally flawed as a unique identifier I think. The new design I'm working on makes them entirely optional and uses a sha256 of the URL instead, but it's going to be hard to change the folder layout of the archive to hashes if everyone's right now are timestamp-based.
Related to: https://github.com/pirate/bookmark-archiver/issues/74
@pirate commented on GitHub (Mar 30, 2019):
@kergoth a quick update, v0.3.0 adds some improvement to the timestamp parsing, but it's still not perfect.
It doesn't yet handle Firefox's timestamps being off by 10x, and Chrome's timestamps aren't fixed from 1601 yet either, but it's a start:
https://github.com/pirate/ArchiveBox/blob/dev/archivebox/util.py#L369
@pirate commented on GitHub (Jul 24, 2020):
I think the latest
djangobranch gets us as close as we're going to get without implementing custom offset parsing for different sources.Comment back here if you're still having troubles with timestamps being wildly off and I can reopen the ticket.