[GH-ISSUE #218] Crawler aint crawling [Error when performing the request to....] #158

Closed
opened 2026-03-02 11:47:11 +03:00 by kerem · 3 comments
Owner

Originally created by @Brancliff on GitHub (Jun 12, 2024).
Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/218

Hey! New user here. I just set everything up, and the main hitch I'm at right now is that Hoarder doesn't seem to be able to get any information from links I add. No header image, no text, nothing. In the admin panel, I have a few "background jobs" lined up, but I've left it like that for a day and it hasn't progressed at all.

I also made sure to copy the links from the demo website, just in case the problem was the links themselves

The container stack here has quite a few pieces-- it's the "workers" container that I'd need to check to troubleshoot this, right? Here's what I kept getting in the container logs for it:

Internal Error: Error when performing the request to https://registry.npmjs.org/pnpm; for troubleshooting help, see https://github.com/nodejs/corepack#troubleshooting
at fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22882:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22896:20)
at async fetchLatestStableVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22948:20)
at async fetchLatestStableVersion2 (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22971:14)
at async Engine.getDefaultVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23349:25)
at async executePackageManagerRequest (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24207:28)
at async BinaryCommand.validateAndExecute (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:21173:22)
at async _Cli.run (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22148:18)
at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24279:12)

... Is that related to this at all? Or am I on the wrong trail entirely here

Originally created by @Brancliff on GitHub (Jun 12, 2024). Original GitHub issue: https://github.com/karakeep-app/karakeep/issues/218 Hey! New user here. I just set everything up, and the main hitch I'm at right now is that Hoarder doesn't seem to be able to get any information from links I add. No header image, no text, nothing. In the admin panel, I have a few "background jobs" lined up, but I've left it like that for a day and it hasn't progressed at all. I also made sure to copy the links from the demo website, just in case the problem was the links themselves The container stack here has quite a few pieces-- it's the "workers" container that I'd need to check to troubleshoot this, right? Here's what I kept getting in the container logs for it: Internal Error: Error when performing the request to https://registry.npmjs.org/pnpm; for troubleshooting help, see https://github.com/nodejs/corepack#troubleshooting at fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22882:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22896:20) at async fetchLatestStableVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22948:20) at async fetchLatestStableVersion2 (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22971:14) at async Engine.getDefaultVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23349:25) at async executePackageManagerRequest (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24207:28) at async BinaryCommand.validateAndExecute (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:21173:22) at async _Cli.run (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22148:18) at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24279:12) ... Is that related to this at all? Or am I on the wrong trail entirely here
kerem 2026-03-02 11:47:11 +03:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@kamtschatka commented on GitHub (Jun 12, 2024):

yes, the worker is responsible for filling everything with data. Seems like you have connectivity issues (or at least your docker container has) and pnpm can not be set up correctly, which is used to download all the dependencies for the worker.

<!-- gh-comment-id:2162656166 --> @kamtschatka commented on GitHub (Jun 12, 2024): yes, the worker is responsible for filling everything with data. Seems like you have connectivity issues (or at least your docker container has) and pnpm can not be set up correctly, which is used to download all the dependencies for the worker.
Author
Owner

@Brancliff commented on GitHub (Jun 13, 2024):

I got my connection problems sorted out. I was able to ping google from inside the CLI for both the web and workers containers, so I know they're able to access the internet. I'm getting a new error now:

Node.js v21.7.3
 ELIFECYCLE  Command failed with exit code 1.
> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts
2024-06-13T08:13:39.968Z info: Workers version: 0.14.0
2024-06-13T08:13:40.002Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:35) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-06-13T08:13:40.042Z info: [Crawler] Successfully resolved IP address, new address: http://172.29.44.2:9222/
2024-06-13T08:13:41.212Z info: Starting crawler worker ...
2024-06-13T08:13:41.220Z info: Starting inference worker ...
2024-06-13T08:13:41.231Z info: Starting search indexing worker ...
2024-06-13T08:13:41.559Z error: [Crawler][9] Crawling job failed: SqliteError: no such table: bookmarks
/app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5
	return this[cppdb].prepare(sql, this, false);
	                   ^
SqliteError: no such table: bookmarkLinks
    at Database.prepare (/app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5:21)
    at BetterSQLiteSession.prepareQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/better-sqlite3/session.cjs:42:30)
    at BetterSQLiteSession.prepareOneTimeQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/session.cjs:91:17)
    at QueryPromise._prepare (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:101:81)
    at QueryPromise.run (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:111:17)
    at QueryPromise.execute (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:123:54)
    at QueryPromise.then (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/query-promise.cjs:44:17)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 'SQLITE_ERROR'
}      

And then after that, this whole error log basically loops again

Some more info: I have 3 test bookmarks and stuck at 3 pending crawling jobs. No indexing jobs, and 3 pending inference jobs

<!-- gh-comment-id:2164965916 --> @Brancliff commented on GitHub (Jun 13, 2024): I got my connection problems sorted out. I was able to ping google from inside the CLI for both the web and workers containers, so I know they're able to access the internet. I'm getting a new error now: ``` Node.js v21.7.3  ELIFECYCLE  Command failed with exit code 1. > @hoarder/workers@0.1.0 start:prod /app/apps/workers > tsx index.ts 2024-06-13T08:13:39.968Z info: Workers version: 0.14.0 2024-06-13T08:13:40.002Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222 (node:35) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. (Use `node --trace-deprecation ...` to show where the warning was created) 2024-06-13T08:13:40.042Z info: [Crawler] Successfully resolved IP address, new address: http://172.29.44.2:9222/ 2024-06-13T08:13:41.212Z info: Starting crawler worker ... 2024-06-13T08:13:41.220Z info: Starting inference worker ... 2024-06-13T08:13:41.231Z info: Starting search indexing worker ... 2024-06-13T08:13:41.559Z error: [Crawler][9] Crawling job failed: SqliteError: no such table: bookmarks /app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5 return this[cppdb].prepare(sql, this, false); ^ SqliteError: no such table: bookmarkLinks at Database.prepare (/app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5:21) at BetterSQLiteSession.prepareQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/better-sqlite3/session.cjs:42:30) at BetterSQLiteSession.prepareOneTimeQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/session.cjs:91:17) at QueryPromise._prepare (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:101:81) at QueryPromise.run (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:111:17) at QueryPromise.execute (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:123:54) at QueryPromise.then (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/query-promise.cjs:44:17) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { code: 'SQLITE_ERROR' } ``` And then after that, this whole error log basically loops again Some more info: I have 3 test bookmarks and stuck at 3 pending crawling jobs. No indexing jobs, and 3 pending inference jobs
Author
Owner

@MohamedBassem commented on GitHub (Jun 13, 2024):

this error usually indicates that the Data dir of the workers and web containers are not the same. They should be the same as they share the same database.

<!-- gh-comment-id:2164970886 --> @MohamedBassem commented on GitHub (Jun 13, 2024): this error usually indicates that the Data dir of the workers and web containers are not the same. They should be the same as they share the same database.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#158
No description provided.