[GH-ISSUE #44] Support archiving URLs found in bookmark descriptions #3050

Closed
opened 2026-03-14 20:46:31 +03:00 by kerem · 2 comments
Owner

Originally created by @nodiscc on GitHub (Aug 29, 2017).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/44

It would be nice to be able to also archive links in bookmark description fields (<DD> HTML "tag").
For example the 1st bookmark below contains URLs in the description

<DT><A HREF="https://en.wikipedia.org/wiki/SQLite" ADD_DATE="1504028066" PRIVATE="0" TAGS="doc,admin,dev,databases">SQLite - Wikipedia</A>
<DD>http://www.service-architecture.com/articles/database/sql-92.html
https://www.sqlite.org/wal.html
https://medium.com/linode-cube/sqlite-the-universal-sql-database-engine-a26199c366fc
<DT><A HREF="https://www.youtube.com/watch?v=j5nZhf8SjXw" ADD_DATE="1503928529" PRIVATE="0" TAGS="video,wtf">Needs more JPEG. - YouTube</A>

Use case: I sometimes use a single bookmark to store relevant links/comments about a topic, instead of creating multiple bookmarks (so that all information can be found in one place). I imagine that it would be better as an optional feature/config switch.

Originally created by @nodiscc on GitHub (Aug 29, 2017). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/44 It would be nice to be able to also archive links in bookmark description fields (`<DD>` HTML "tag"). For example the 1st bookmark below contains URLs in the description ```html <DT><A HREF="https://en.wikipedia.org/wiki/SQLite" ADD_DATE="1504028066" PRIVATE="0" TAGS="doc,admin,dev,databases">SQLite - Wikipedia</A> <DD>http://www.service-architecture.com/articles/database/sql-92.html https://www.sqlite.org/wal.html https://medium.com/linode-cube/sqlite-the-universal-sql-database-engine-a26199c366fc <DT><A HREF="https://www.youtube.com/watch?v=j5nZhf8SjXw" ADD_DATE="1503928529" PRIVATE="0" TAGS="video,wtf">Needs more JPEG. - YouTube</A> ``` Use case: I sometimes use a single bookmark to store relevant links/comments about a topic, instead of creating multiple bookmarks (so that all information can be found in one place). I imagine that it would be better as an optional feature/config switch.
kerem closed this issue 2026-03-14 20:46:37 +03:00
Author
Owner

@pirate commented on GitHub (Aug 30, 2017):

Interesting use case, I can see how this would be helpful.

Here's a script to fix your export and make it readable by bookmark archiver

export = open('export.html', 'r').read().replace('<DT>', '').replace('<DD>', '')
lines = export.split('\n')
corrected_lines = []

for idx, line in enumerate(lines):
    if line.startswith('http'):
        link_above = corrected_lines[idx - 1]
        ts = link_above.split('ADD_DATE="', 1)[-1].split('"', 1)[0]
        tags = link_above.split('TAGS="', 1)[-1].split('"', 1)[0]
        corrected_lines.append(f'<A HREF="{line}" ADD_DATE="{ts}" PRIVATE="0" TAGS="{tags}">{line}</A>')
    else:
        corrected_lines.append(line)

with open('fixed_export.html', 'w') as f:
    f.write('\n'.join(corrected_lines))
<!-- gh-comment-id:325895963 --> @pirate commented on GitHub (Aug 30, 2017): Interesting use case, I can see how this would be helpful. Here's a script to fix your export and make it readable by bookmark archiver ```python export = open('export.html', 'r').read().replace('<DT>', '').replace('<DD>', '') lines = export.split('\n') corrected_lines = [] for idx, line in enumerate(lines): if line.startswith('http'): link_above = corrected_lines[idx - 1] ts = link_above.split('ADD_DATE="', 1)[-1].split('"', 1)[0] tags = link_above.split('TAGS="', 1)[-1].split('"', 1)[0] corrected_lines.append(f'<A HREF="{line}" ADD_DATE="{ts}" PRIVATE="0" TAGS="{tags}">{line}</A>') else: corrected_lines.append(line) with open('fixed_export.html', 'w') as f: f.write('\n'.join(corrected_lines)) ```
Author
Owner

@pirate commented on GitHub (Oct 30, 2017):

Seeing as this is a pretty rare use case, I'm closing this issue. This script should help if anyone has a similar problem ^.

<!-- gh-comment-id:340419420 --> @pirate commented on GitHub (Oct 30, 2017): Seeing as this is a pretty rare use case, I'm closing this issue. This script should help if anyone has a similar problem ^.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3050
No description provided.