[GH-ISSUE #537] URL seems to get hoarder stuck in a crawl loop on v0.18.0 #347

Open
opened 2026-03-02 11:49:04 +03:00 by kerem · 8 comments
Owner

Originally created by @dimatx on GitHub (Oct 14, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/537

I seem to have a URL that gets hoarder stuck in a loop where it tries to crawl, then recrawls, etc. It only stops when I delete the bookmark.

Please let me know if you need any more info than what I provided.

2024-10-14T16:50:49.322Z info: [search][890] Attempting to index bookmark with id q10606sx8ev4xhstqbjw5gaq ...
2024-10-14T16:50:49.340Z info: [inference][889] Starting an inference job for bookmark with id "q10606sx8ev4xhstqbjw5gaq"
2024-10-14T16:50:49.511Z info: [search][890] Completed successfully
2024-10-14T16:50:50.949Z info: [inference][889] Inferring tag for bookmark "q10606sx8ev4xhstqbjw5gaq" used 2936 tokens and inferred: LineageOS,Lenovo ThinkSmart View,Home Automation,Android Installation,Open Source
2024-10-14T16:50:51.001Z info: [inference][889] Completed successfully
2024-10-14T16:50:51.556Z info: [search][891] Attempting to index bookmark with id q10606sx8ev4xhstqbjw5gaq ...
2024-10-14T16:50:51.673Z info: [search][891] Completed successfully
2024-10-14T16:54:23.815Z info: [Crawler][892] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:54:23.815Z info: [Crawler][892] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/
2024-10-14T16:54:23.882Z info: [Crawler][892] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8"
2024-10-14T16:54:23.907Z info: [search][893] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:54:23.975Z info: [search][893] Completed successfully
2024-10-14T16:54:26.944Z info: [Crawler][892] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ...
2024-10-14T16:54:29.602Z info: [Crawler][892] Finished waiting for the page to load.
2024-10-14T16:54:29.833Z info: [Crawler][892] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-10-14T16:54:29.842Z info: [Crawler][892] Will attempt to extract metadata from page ...
2024-10-14T16:54:30.699Z info: [Crawler][892] Will attempt to extract readable content ...
2024-10-14T16:54:31.384Z info: [Crawler][892] Done extracting readable content.
2024-10-14T16:54:31.396Z info: [Crawler][892] Stored the screenshot as assetId: 249e0a78-90b8-4e26-b051-17730c928aae
2024-10-14T16:54:31.443Z info: [Crawler][892] Done extracting metadata from the page.
2024-10-14T16:54:31.443Z info: [Crawler][892] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png"
2024-10-14T16:54:31.553Z info: [Crawler][892] Downloaded image as assetId: c6acee98-fa08-4076-8f7c-8e05becff000
2024-10-14T16:54:31.612Z info: [Crawler][892] Completed successfully
2024-10-14T16:54:32.419Z info: [search][895] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:54:32.437Z info: [inference][894] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:54:32.554Z info: [search][895] Completed successfully
2024-10-14T16:54:33.930Z info: [inference][894] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2122 tokens and inferred: Roth IRA,Backdoor Roth,Fidelity,Personal Finance,Investing
2024-10-14T16:54:33.971Z info: [inference][894] Completed successfully
2024-10-14T16:54:34.587Z info: [search][896] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:54:34.652Z info: [search][896] Completed successfully
2024-10-14T16:55:03.684Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:55:03.684Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/
2024-10-14T16:55:03.757Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8"
2024-10-14T16:55:07.264Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ...
2024-10-14T16:55:11.589Z info: [Crawler][897] Finished waiting for the page to load.
2024-10-14T16:55:11.803Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-10-14T16:55:11.810Z info: [Crawler][897] Will attempt to extract metadata from page ...
2024-10-14T16:55:12.468Z info: [Crawler][897] Will attempt to extract readable content ...
2024-10-14T16:55:13.130Z info: [Crawler][897] Done extracting readable content.
2024-10-14T16:55:13.141Z info: [Crawler][897] Stored the screenshot as assetId: 9a16e763-4619-46d3-8e9e-281e2280acec
2024-10-14T16:55:13.181Z info: [Crawler][897] Done extracting metadata from the page.
2024-10-14T16:55:13.182Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png"
2024-10-14T16:55:13.289Z info: [Crawler][897] Downloaded image as assetId: 76a7a90b-ccab-462e-9f06-9afa6799cf93
2024-10-14T16:55:13.381Z info: [Crawler][897] Will attempt to archive page ...
2024-10-14T16:55:14.169Z info: [inference][898] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:55:14.186Z info: [search][899] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:55:14.249Z info: [search][899] Completed successfully
2024-10-14T16:55:16.005Z info: [inference][898] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2122 tokens and inferred: Roth IRA,Backdoor Roth,Fidelity,Investing,Personal Finance
2024-10-14T16:55:16.041Z info: [inference][898] Completed successfully
2024-10-14T16:55:16.287Z info: [search][900] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:55:16.348Z info: [search][900] Completed successfully
2024-10-14T16:56:03.719Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:56:03.719Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/
2024-10-14T16:56:03.771Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8"
2024-10-14T16:56:06.978Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ...
2024-10-14T16:56:09.540Z info: [Crawler][897] Finished waiting for the page to load.
2024-10-14T16:56:09.737Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-10-14T16:56:09.747Z info: [Crawler][897] Will attempt to extract metadata from page ...
2024-10-14T16:56:10.481Z info: [Crawler][897] Will attempt to extract readable content ...
2024-10-14T16:56:11.025Z info: [Crawler][897] Done extracting readable content.
2024-10-14T16:56:11.038Z info: [Crawler][897] Stored the screenshot as assetId: 72382d5e-3a19-4382-83d2-1e1cec207c1e
2024-10-14T16:56:11.086Z info: [Crawler][897] Done extracting metadata from the page.
2024-10-14T16:56:11.086Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png"
2024-10-14T16:56:11.215Z info: [Crawler][897] Downloaded image as assetId: 6d054521-87bd-4e25-9b9a-f12c81784706
2024-10-14T16:56:11.312Z info: [Crawler][897] Will attempt to archive page ...
2024-10-14T16:56:12.066Z info: [inference][901] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:56:12.082Z info: [search][902] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:56:12.146Z info: [search][902] Completed successfully
2024-10-14T16:56:15.205Z info: [inference][901] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2123 tokens and inferred: Backdoor Roth IRA,Fidelity,Retirement Planning,Personal Finance,Investing
2024-10-14T16:56:15.258Z info: [inference][901] Completed successfully
2024-10-14T16:56:16.187Z info: [search][903] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:56:16.301Z info: [search][903] Completed successfully
2024-10-14T16:57:03.761Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:57:03.761Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/
2024-10-14T16:57:03.813Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8"
2024-10-14T16:57:08.863Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ...
2024-10-14T16:57:11.373Z info: [Crawler][897] Finished waiting for the page to load.
2024-10-14T16:57:11.577Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-10-14T16:57:11.586Z info: [Crawler][897] Will attempt to extract metadata from page ...
2024-10-14T16:57:12.127Z info: [Crawler][897] Will attempt to extract readable content ...
2024-10-14T16:57:12.678Z info: [Crawler][897] Done extracting readable content.
2024-10-14T16:57:12.690Z info: [Crawler][897] Stored the screenshot as assetId: c9b9223a-8cfc-4061-aab7-362035ec162e
2024-10-14T16:57:12.733Z info: [Crawler][897] Done extracting metadata from the page.
2024-10-14T16:57:12.733Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png"
2024-10-14T16:57:12.850Z info: [Crawler][897] Downloaded image as assetId: 88122b54-c814-439b-96bf-f265809f2cbe
2024-10-14T16:57:12.945Z info: [Crawler][897] Will attempt to archive page ...
2024-10-14T16:57:13.718Z info: [inference][904] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc"
2024-10-14T16:57:13.735Z info: [search][905] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:57:13.803Z info: [search][905] Completed successfully
2024-10-14T16:57:15.744Z info: [inference][904] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2124 tokens and inferred: Backdoor Roth IRA,Fidelity,Personal Finance,Retirement Planning,Investing Strategies
2024-10-14T16:57:15.780Z info: [inference][904] Completed successfully
2024-10-14T16:57:15.830Z info: [search][906] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ...
2024-10-14T16:57:15.945Z info: [search][906] Completed successfully
Originally created by @dimatx on GitHub (Oct 14, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/537 I seem to have a URL that gets hoarder stuck in a loop where it tries to crawl, then recrawls, etc. It only stops when I delete the bookmark. Please let me know if you need any more info than what I provided. ``` 2024-10-14T16:50:49.322Z info: [search][890] Attempting to index bookmark with id q10606sx8ev4xhstqbjw5gaq ... 2024-10-14T16:50:49.340Z info: [inference][889] Starting an inference job for bookmark with id "q10606sx8ev4xhstqbjw5gaq" 2024-10-14T16:50:49.511Z info: [search][890] Completed successfully 2024-10-14T16:50:50.949Z info: [inference][889] Inferring tag for bookmark "q10606sx8ev4xhstqbjw5gaq" used 2936 tokens and inferred: LineageOS,Lenovo ThinkSmart View,Home Automation,Android Installation,Open Source 2024-10-14T16:50:51.001Z info: [inference][889] Completed successfully 2024-10-14T16:50:51.556Z info: [search][891] Attempting to index bookmark with id q10606sx8ev4xhstqbjw5gaq ... 2024-10-14T16:50:51.673Z info: [search][891] Completed successfully 2024-10-14T16:54:23.815Z info: [Crawler][892] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:54:23.815Z info: [Crawler][892] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ 2024-10-14T16:54:23.882Z info: [Crawler][892] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8" 2024-10-14T16:54:23.907Z info: [search][893] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:54:23.975Z info: [search][893] Completed successfully 2024-10-14T16:54:26.944Z info: [Crawler][892] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ... 2024-10-14T16:54:29.602Z info: [Crawler][892] Finished waiting for the page to load. 2024-10-14T16:54:29.833Z info: [Crawler][892] Finished capturing page content and a screenshot. FullPageScreenshot: false 2024-10-14T16:54:29.842Z info: [Crawler][892] Will attempt to extract metadata from page ... 2024-10-14T16:54:30.699Z info: [Crawler][892] Will attempt to extract readable content ... 2024-10-14T16:54:31.384Z info: [Crawler][892] Done extracting readable content. 2024-10-14T16:54:31.396Z info: [Crawler][892] Stored the screenshot as assetId: 249e0a78-90b8-4e26-b051-17730c928aae 2024-10-14T16:54:31.443Z info: [Crawler][892] Done extracting metadata from the page. 2024-10-14T16:54:31.443Z info: [Crawler][892] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png" 2024-10-14T16:54:31.553Z info: [Crawler][892] Downloaded image as assetId: c6acee98-fa08-4076-8f7c-8e05becff000 2024-10-14T16:54:31.612Z info: [Crawler][892] Completed successfully 2024-10-14T16:54:32.419Z info: [search][895] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:54:32.437Z info: [inference][894] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:54:32.554Z info: [search][895] Completed successfully 2024-10-14T16:54:33.930Z info: [inference][894] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2122 tokens and inferred: Roth IRA,Backdoor Roth,Fidelity,Personal Finance,Investing 2024-10-14T16:54:33.971Z info: [inference][894] Completed successfully 2024-10-14T16:54:34.587Z info: [search][896] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:54:34.652Z info: [search][896] Completed successfully 2024-10-14T16:55:03.684Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:55:03.684Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ 2024-10-14T16:55:03.757Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8" 2024-10-14T16:55:07.264Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ... 2024-10-14T16:55:11.589Z info: [Crawler][897] Finished waiting for the page to load. 2024-10-14T16:55:11.803Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false 2024-10-14T16:55:11.810Z info: [Crawler][897] Will attempt to extract metadata from page ... 2024-10-14T16:55:12.468Z info: [Crawler][897] Will attempt to extract readable content ... 2024-10-14T16:55:13.130Z info: [Crawler][897] Done extracting readable content. 2024-10-14T16:55:13.141Z info: [Crawler][897] Stored the screenshot as assetId: 9a16e763-4619-46d3-8e9e-281e2280acec 2024-10-14T16:55:13.181Z info: [Crawler][897] Done extracting metadata from the page. 2024-10-14T16:55:13.182Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png" 2024-10-14T16:55:13.289Z info: [Crawler][897] Downloaded image as assetId: 76a7a90b-ccab-462e-9f06-9afa6799cf93 2024-10-14T16:55:13.381Z info: [Crawler][897] Will attempt to archive page ... 2024-10-14T16:55:14.169Z info: [inference][898] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:55:14.186Z info: [search][899] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:55:14.249Z info: [search][899] Completed successfully 2024-10-14T16:55:16.005Z info: [inference][898] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2122 tokens and inferred: Roth IRA,Backdoor Roth,Fidelity,Investing,Personal Finance 2024-10-14T16:55:16.041Z info: [inference][898] Completed successfully 2024-10-14T16:55:16.287Z info: [search][900] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:55:16.348Z info: [search][900] Completed successfully 2024-10-14T16:56:03.719Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:56:03.719Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ 2024-10-14T16:56:03.771Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8" 2024-10-14T16:56:06.978Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ... 2024-10-14T16:56:09.540Z info: [Crawler][897] Finished waiting for the page to load. 2024-10-14T16:56:09.737Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false 2024-10-14T16:56:09.747Z info: [Crawler][897] Will attempt to extract metadata from page ... 2024-10-14T16:56:10.481Z info: [Crawler][897] Will attempt to extract readable content ... 2024-10-14T16:56:11.025Z info: [Crawler][897] Done extracting readable content. 2024-10-14T16:56:11.038Z info: [Crawler][897] Stored the screenshot as assetId: 72382d5e-3a19-4382-83d2-1e1cec207c1e 2024-10-14T16:56:11.086Z info: [Crawler][897] Done extracting metadata from the page. 2024-10-14T16:56:11.086Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png" 2024-10-14T16:56:11.215Z info: [Crawler][897] Downloaded image as assetId: 6d054521-87bd-4e25-9b9a-f12c81784706 2024-10-14T16:56:11.312Z info: [Crawler][897] Will attempt to archive page ... 2024-10-14T16:56:12.066Z info: [inference][901] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:56:12.082Z info: [search][902] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:56:12.146Z info: [search][902] Completed successfully 2024-10-14T16:56:15.205Z info: [inference][901] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2123 tokens and inferred: Backdoor Roth IRA,Fidelity,Retirement Planning,Personal Finance,Investing 2024-10-14T16:56:15.258Z info: [inference][901] Completed successfully 2024-10-14T16:56:16.187Z info: [search][903] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:56:16.301Z info: [search][903] Completed successfully 2024-10-14T16:57:03.761Z info: [Crawler][897] Will crawl "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" for link with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:57:03.761Z info: [Crawler][897] Attempting to determine the content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ 2024-10-14T16:57:03.813Z info: [Crawler][897] Content-type for the url https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/ is "text/html; charset=UTF-8" 2024-10-14T16:57:08.863Z info: [Crawler][897] Successfully navigated to "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/". Waiting for the page to load ... 2024-10-14T16:57:11.373Z info: [Crawler][897] Finished waiting for the page to load. 2024-10-14T16:57:11.577Z info: [Crawler][897] Finished capturing page content and a screenshot. FullPageScreenshot: false 2024-10-14T16:57:11.586Z info: [Crawler][897] Will attempt to extract metadata from page ... 2024-10-14T16:57:12.127Z info: [Crawler][897] Will attempt to extract readable content ... 2024-10-14T16:57:12.678Z info: [Crawler][897] Done extracting readable content. 2024-10-14T16:57:12.690Z info: [Crawler][897] Stored the screenshot as assetId: c9b9223a-8cfc-4061-aab7-362035ec162e 2024-10-14T16:57:12.733Z info: [Crawler][897] Done extracting metadata from the page. 2024-10-14T16:57:12.733Z info: [Crawler][897] Downloading image from "https://odsonfinance.com/wp-content/uploads/2024/01/How-to-do-a-Backdoor-Roth-IRA-1.png" 2024-10-14T16:57:12.850Z info: [Crawler][897] Downloaded image as assetId: 88122b54-c814-439b-96bf-f265809f2cbe 2024-10-14T16:57:12.945Z info: [Crawler][897] Will attempt to archive page ... 2024-10-14T16:57:13.718Z info: [inference][904] Starting an inference job for bookmark with id "k1l8zj5ixpgj9hugbvmibqfc" 2024-10-14T16:57:13.735Z info: [search][905] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:57:13.803Z info: [search][905] Completed successfully 2024-10-14T16:57:15.744Z info: [inference][904] Inferring tag for bookmark "k1l8zj5ixpgj9hugbvmibqfc" used 2124 tokens and inferred: Backdoor Roth IRA,Fidelity,Personal Finance,Retirement Planning,Investing Strategies 2024-10-14T16:57:15.780Z info: [inference][904] Completed successfully 2024-10-14T16:57:15.830Z info: [search][906] Attempting to index bookmark with id k1l8zj5ixpgj9hugbvmibqfc ... 2024-10-14T16:57:15.945Z info: [search][906] Completed successfully ```
Author
Owner

@raviwarrier commented on GitHub (Oct 15, 2024):

I have a similar problem. really old bookmarks from now defunct websites or apps. But when I try to search their URL in hoarder, I get no results and so, I have no easy way to find and delete them.

<!-- gh-comment-id:2412684340 --> @raviwarrier commented on GitHub (Oct 15, 2024): I have a similar problem. really old bookmarks from now defunct websites or apps. But when I try to search their URL in hoarder, I get no results and so, I have no easy way to find and delete them.
Author
Owner

@kamtschatka commented on GitHub (Oct 15, 2024):

I tried adding the URL "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" and everything works just fine.
Are you on the latest version? How are you deploying hoarder?

<!-- gh-comment-id:2414372455 --> @kamtschatka commented on GitHub (Oct 15, 2024): I tried adding the URL "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" and everything works just fine. Are you on the latest version? How are you deploying hoarder?
Author
Owner

@dimatx commented on GitHub (Oct 16, 2024):

Docker compose and on the latest version. Any other info I can help provide for troubleshooting, assuming I can reproduce?

<!-- gh-comment-id:2415720555 --> @dimatx commented on GitHub (Oct 16, 2024): Docker compose and on the latest version. Any other info I can help provide for troubleshooting, assuming I can reproduce?
Author
Owner

@kamtschatka commented on GitHub (Oct 16, 2024):

any environment variables you have set?

<!-- gh-comment-id:2417287162 --> @kamtschatka commented on GitHub (Oct 16, 2024): any environment variables you have set?
Author
Owner

@dimatx commented on GitHub (Oct 16, 2024):

Did you try downloading the full page archive? that seems to be what is causing the loop, the process never seems to finish.

Here's my docker compose and .env.

version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 3200:3000
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      OPENAI_API_KEY: *********************************
      DATA_DIR: /data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.6
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data
volumes:
  meilisearch: null
  data: null
networks: {}
HOARDER_VERSION=release
NEXTAUTH_SECRET=*******************
MEILI_MASTER_KEY=*******************
NEXTAUTH_URL=http://*******************:3200
<!-- gh-comment-id:2417384352 --> @dimatx commented on GitHub (Oct 16, 2024): Did you try downloading the full page archive? that seems to be what is causing the loop, the process never seems to finish. Here's my docker compose and .env. ``` version: "3.8" services: web: image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release} restart: unless-stopped volumes: - data:/data ports: - 3200:3000 env_file: - .env environment: MEILI_ADDR: http://meilisearch:7700 BROWSER_WEB_URL: http://chrome:9222 OPENAI_API_KEY: ********************************* DATA_DIR: /data chrome: image: gcr.io/zenika-hub/alpine-chrome:123 restart: unless-stopped command: - --no-sandbox - --disable-gpu - --disable-dev-shm-usage - --remote-debugging-address=0.0.0.0 - --remote-debugging-port=9222 - --hide-scrollbars meilisearch: image: getmeili/meilisearch:v1.6 restart: unless-stopped env_file: - .env environment: MEILI_NO_ANALYTICS: "true" volumes: - meilisearch:/meili_data volumes: meilisearch: null data: null networks: {} ``` ``` HOARDER_VERSION=release NEXTAUTH_SECRET=******************* MEILI_MASTER_KEY=******************* NEXTAUTH_URL=http://*******************:3200 ```
Author
Owner

@kamtschatka commented on GitHub (Oct 19, 2024):

try increasing the CRAWLER_JOB_TIMEOUT_SEC. The default is 60 seconds, if the full page archival takes too long, it might cause this behavior.

<!-- gh-comment-id:2423624728 --> @kamtschatka commented on GitHub (Oct 19, 2024): try increasing the CRAWLER_JOB_TIMEOUT_SEC. The default is 60 seconds, if the full page archival takes too long, it might cause this behavior.
Author
Owner

@dimatx commented on GitHub (Oct 19, 2024):

Made it 300 seconds, issue persists. Isn't it strange that there is a loop despite no errors/failures in the logs? It also uses OpenAI credits over and over according to the logs, so could run up a bill for someone without a low budget set in OpenAI.

<!-- gh-comment-id:2424266617 --> @dimatx commented on GitHub (Oct 19, 2024): Made it 300 seconds, issue persists. Isn't it strange that there is a loop despite no errors/failures in the logs? It also uses OpenAI credits over and over according to the logs, so could run up a bill for someone without a low budget set in OpenAI.
Author
Owner

@fifty-six commented on GitHub (Nov 13, 2024):

I have a similar issue with "https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html", if that's any help.

<!-- gh-comment-id:2475043135 --> @fifty-six commented on GitHub (Nov 13, 2024): I have a similar issue with "https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html", if that's any help.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#347
No description provided.