[GH-ISSUE #610] Is there a way to extract the code snippets as well? #389

Open
opened 2026-03-02 11:49:25 +03:00 by kerem · 6 comments
Owner

Originally created by @ulises-castro on GitHub (Nov 1, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/610

Describe the feature you'd like

It would be great if you can save articles along with code snippets, because it seems like it does not include them.

Maybe if we have "a flag" to include code snippets or not would be nice.

image

Describe the benefits this would bring to existing Hoarder users

You can take notes and review code implementation later e.g, when you review a API and want it to back to the last position you have been.

Can the goal of this request already be achieved via other means?

I'm not sure yet about this.

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Originally created by @ulises-castro on GitHub (Nov 1, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/610 ### Describe the feature you'd like It would be great if you can save articles along with code snippets, because it seems like it does not include them. Maybe if we have "a flag" to include code snippets or not would be nice. ![image](https://github.com/user-attachments/assets/f67e0824-49cf-409c-b154-2a6c66ea82a3) ### Describe the benefits this would bring to existing Hoarder users You can take notes and review code implementation later e.g, when you review a API and want it to back to the last position you have been. ### Can the goal of this request already be achieved via other means? I'm not sure yet about this. ### Have you searched for an existing open/closed issue? - [X] I have searched for existing issues and none cover my fundamental request ### Additional context _No response_
Author
Owner

@kamtschatka commented on GitHub (Nov 1, 2024):

please provide a sample where you got this

<!-- gh-comment-id:2452435440 --> @kamtschatka commented on GitHub (Nov 1, 2024): please provide a sample where you got this
Author
Owner

@ulises-castro commented on GitHub (Nov 1, 2024):

please provide a sample where you got this

What do you mean?

I took that sh from the original article, I think we can use some code highlighter to show the code and extract with the crawler

<!-- gh-comment-id:2452676973 --> @ulises-castro commented on GitHub (Nov 1, 2024): > please provide a sample where you got this What do you mean? I took that sh from the original article, I think we can use some code highlighter to show the code and extract with the crawler
Author
Owner

@kamtschatka commented on GitHub (Nov 1, 2024):

A url

<!-- gh-comment-id:2452683388 --> @kamtschatka commented on GitHub (Nov 1, 2024): A url
Author
Owner

@ulises-castro commented on GitHub (Nov 2, 2024):

A url

https://realpython.com/python-microservices-grpc/#asyncio-and-grpc

<!-- gh-comment-id:2453221213 --> @ulises-castro commented on GitHub (Nov 2, 2024): > A url https://realpython.com/python-microservices-grpc/#asyncio-and-grpc
Author
Owner

@kamtschatka commented on GitHub (Nov 6, 2024):

I had a look at this, we are using DOMPurify, which already strips those code blocks. It is possible to change the code like this:

  const purifiedHTML = purify.sanitize(htmlContent, {ADD_TAGS: ["pre", "code", "span"]});

Then the code block is actually retained, but then we are using mozilla/readability and that ignores the code block then.
I don't see any way to configure this, so that would definitely be a bigger rework.

<!-- gh-comment-id:2460593102 --> @kamtschatka commented on GitHub (Nov 6, 2024): I had a look at this, we are using DOMPurify, which already strips those code blocks. It is possible to change the code like this: ``` const purifiedHTML = purify.sanitize(htmlContent, {ADD_TAGS: ["pre", "code", "span"]}); ``` Then the code block is actually retained, but then we are using [mozilla/readability](https://github.com/mozilla/readability) and that ignores the code block then. I don't see any way to configure this, so that would definitely be a bigger rework.
Author
Owner

@jakob1379 commented on GitHub (Jan 27, 2025):

you could add something along the line with

"if there are any code snippets, add them to a codeblock" for the ai summary

<!-- gh-comment-id:2615041873 --> @jakob1379 commented on GitHub (Jan 27, 2025): you could add something along the line with "if there are any code snippets, add them to a codeblock" for the ai summary
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#389
No description provided.