[GH-ISSUE #840] Parse or Get Original Publication Date #549

Closed
opened 2026-03-02 11:50:45 +03:00 by kerem · 6 comments
Owner

Originally created by @ClintElliotMalcolm on GitHub (Jan 6, 2025).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/840

Describe the feature you'd like

With RSS feeds it shows the original publication date. Additionally on many forums that information is displayed as well. It would be useful to be able to filter not only on when retrieved but when the post or article was originally written.

Describe the benefits this would bring to existing Hoarder users

When RSS feeds initially become imported it gets a large number of articles including from the past, we should be able to select what is relevant to us.
Additionally if you add a reddit or other forum post that is old, it can be helpful to know what time that was originally written rather than just retrieved on.

Can the goal of this request already be achieved via other means?

Not that I know of, maybe with AI tags. But that wouldn't be able to do comparison.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

Similar issue to https://github.com/hoarder-app/hoarder/issues/694. I would also like to be able to get the publication date of any article. I understand that this can be prone to failure so this field should be human editable. Or even better have a selector of anything date like in the html/file that a user can select between defaulting to the first one.

Originally created by @ClintElliotMalcolm on GitHub (Jan 6, 2025). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/840 ### Describe the feature you'd like With RSS feeds it shows the original publication date. Additionally on many forums that information is displayed as well. It would be useful to be able to filter not only on when retrieved but when the post or article was originally written. ### Describe the benefits this would bring to existing Hoarder users When RSS feeds initially become imported it gets a large number of articles including from the past, we should be able to select what is relevant to us. Additionally if you add a reddit or other forum post that is old, it can be helpful to know what time that was originally written rather than just retrieved on. ### Can the goal of this request already be achieved via other means? Not that I know of, maybe with AI tags. But that wouldn't be able to do comparison. ### Have you searched for an existing open/closed issue? - [X] I have searched for existing issues and none cover my fundamental request ### Additional context Similar issue to https://github.com/hoarder-app/hoarder/issues/694. I would also like to be able to get the publication date of any article. I understand that this can be prone to failure so this field should be human editable. Or even better have a selector of anything date like in the html/file that a user can select between defaulting to the first one.
Author
Owner

@kamtschatka commented on GitHub (Jan 6, 2025):

This is very similar to https://github.com/hoarder-app/hoarder/issues/694, where you also have commented.
Is the difference here, that you want to have it parsed out of regular articles as well?

<!-- gh-comment-id:2572550524 --> @kamtschatka commented on GitHub (Jan 6, 2025): This is very similar to https://github.com/hoarder-app/hoarder/issues/694, where you also have commented. Is the difference here, that you want to have it parsed out of regular articles as well?
Author
Owner

@ClintElliotMalcolm commented on GitHub (Jan 6, 2025):

In that bug I interpreted it as setting a minimum date to import from the RSS feed. This is about having that be explicitly in the metadata within hoarder and to be filterable and searchable here as well.

Additionally yes I would like this for regular articles as well. The idea is similar to the citation format I learned where you not only need the retrieved date but also the published date.

I know that this isn't always possible or easy. I mentioned the RSS feeds because at least there it is explicitly in the feed.

<!-- gh-comment-id:2573405229 --> @ClintElliotMalcolm commented on GitHub (Jan 6, 2025): In that bug I interpreted it as setting a minimum date to import from the RSS feed. This is about having that be explicitly in the metadata within hoarder and to be filterable and searchable here as well. Additionally yes I would like this for regular articles as well. The idea is similar to the citation format I learned where you not only need the retrieved date but also the published date. I know that this isn't always possible or easy. I mentioned the RSS feeds because at least there it is explicitly in the feed.
Author
Owner

@kamtschatka commented on GitHub (Jan 6, 2025):

hm maybe I am misunderstanding the other issue, but it sounds to me like the metadata should be parsed out and then you can sort the bookmarks you have already read.
The additional information mentions something about adding a functionality to only import items that happened after a certain time.
Overall I think it makes sense to try to extract the publication date and store it as an additional field for a bookmark, the question is, how reliable that is though, since the format is always different.

<!-- gh-comment-id:2573433245 --> @kamtschatka commented on GitHub (Jan 6, 2025): hm maybe I am misunderstanding the other issue, but it sounds to me like the metadata should be parsed out and then you can sort the bookmarks you have already read. The additional information mentions something about adding a functionality to only import items that happened after a certain time. Overall I think it makes sense to try to extract the publication date and store it as an additional field for a bookmark, the question is, how reliable that is though, since the format is always different.
Author
Owner

@ClintElliotMalcolm commented on GitHub (Jan 6, 2025):

Fair enough I possibly misread it as well. I mostly took out of it the only take RSS articles past date.

And additionally I agree and understand that the pub date is very different in a lot of articles. But as long as it is editable I think that is worthwhile.

<!-- gh-comment-id:2573626252 --> @ClintElliotMalcolm commented on GitHub (Jan 6, 2025): Fair enough I possibly misread it as well. I mostly took out of it the only take RSS articles past date. And additionally I agree and understand that the pub date is very different in a lot of articles. But as long as it is editable I think that is worthwhile.
Author
Owner

@kamtschatka commented on GitHub (Jan 6, 2025):

true, if it is editable, that would make a lot of sense, so you can at least add it manually

<!-- gh-comment-id:2573634261 --> @kamtschatka commented on GitHub (Jan 6, 2025): true, if it is editable, that would make a lot of sense, so you can at least add it manually
Author
Owner

@MohamedBassem commented on GitHub (May 11, 2025):

This is already in the stable version.

<!-- gh-comment-id:2869868231 --> @MohamedBassem commented on GitHub (May 11, 2025): This is already in the stable version.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#549
No description provided.