mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #785] Bug: Pinboard JSON parser doesn't keep original bookmarked timestamp when adding URLs #496
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#496
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gouku on GitHub (Jul 7, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/785
Type
What is the problem that your feature request solves
I'd like to migrate from Pinboard to ArchiveBox. I have over 10,000 bookmarks since 2009. Export them as JSON and import them into ArchiveBox but found that all bookmarks have the same timestamps.
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
The import process could keep the original timestamp that's already in the JSON export:
What hacks or alternative solutions have you tried to solve the problem?
Manually edit 10,000 imported bookmarks? I don't think so...
How badly do you want this new feature?
@gouku commented on GitHub (Jul 7, 2021):
Also there is another issue that breaks all tags when import bookmarks. I found a Pocket related issue https://github.com/ArchiveBox/ArchiveBox/issues/725 but also affects Pinboard. This two issues currently block me from migrating from Pinboard.
@pirate commented on GitHub (Jul 7, 2021):
It's already supposed to be using the original timestamps, so it must be a bug in that code: https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/parsers/generic_json.py#L38 or something weird with your JSON export.
Can you post the first dozen lines of log output from when you add it, I want to make sure that it's actually recognizing it as
JSONand not justGeneric TXT.@gouku commented on GitHub (Jul 7, 2021):
I think it's recognized as JSON:
Also I got errors when import bookmarks. Only about 200 bookmarks imported.
@pirate commented on GitHub (Aug 9, 2023):
Replicating my comment here from the PR:
(So any bugfix should fix the code that sets Snapshot.timestamp, not change Snapshot.added) Sorry for the confusion, I guess we could have clearer comments in the code explaining the differences between each field.
@pirate commented on GitHub (Feb 22, 2024):
Maybe we could use this to build a better pinboard importer: https://github.com/davep/tinboard
@jimwins commented on GitHub (Mar 2, 2024):
The feature suggested in issue #1367 would allow sorting/display by the
bookmarkeddate instead ofadded. (I think that should probably be the default, but I'll make that case over there when I've thought it through more completely.)We should add some test cases to verify that the timestamps are being parsed and stored correctly, but I have verified via manual testing that it does work with the JSON parser including the fixes on the dev branch.
Issue #1188 (about duplicate timestamps on imports) is another issue that can cause problems with Pinboard imports because of the assumption that specific timestamps are associated with a unique URL. This should be fixed when
bookmarked(akatimestamp) is no longer used as a unique identifier for snapshots, which is issue #74. (That will also lay the groundwork for having multiple snapshots of the same URL, which is issue #179.)django-ninja#1399django-ninja#2911django-ninja#4415