[PR #2058] [CLOSED] fix(metascraper): Ensure metascraper plugins respect proxy settings #1963

Closed
opened 2026-03-02 11:59:57 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/2058
Author: @AdrianAcala
Created: 10/20/2025
Status: Closed

Base: mainHead: fix/metascraper-proxy-settings-1863


📝 Commits (1)

  • ccf803b fix(metascraper): Ensure metascraper plugins respect proxy settings

📊 Changes

3 files changed (+267 additions, -26 deletions)

View changed files

📝 apps/workers/utils.ts (+257 -24)
📝 apps/workers/workers/crawlerWorker.ts (+6 -2)
📝 packages/shared/config.ts (+4 -0)

📄 Description

Metascraper plugins like metascraperLogo make internal HTTP requests to fetch logos and metadata, but they don't respect the application's proxy settings. This caused crawling jobs to timeout when only proxy access was available.

The fix temporarily overrides the global fetch function with fetchWithProxy during metadata extraction, ensuring all HTTP requests made by metascraper plugins go through the configured proxy. The original fetch function is restored in a finally block to ensure proper cleanup.

This approach ensures 100% reliance on proxy settings for scraping operations, as required by the issue description.

Add e2e test to verify metadata extraction with proxy settings.

Fixes #1863


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/2058 **Author:** [@AdrianAcala](https://github.com/AdrianAcala) **Created:** 10/20/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/metascraper-proxy-settings-1863` --- ### 📝 Commits (1) - [`ccf803b`](https://github.com/karakeep-app/karakeep/commit/ccf803b6231fd37d1398aa5ae0bbe7a630095d1c) fix(metascraper): Ensure metascraper plugins respect proxy settings ### 📊 Changes **3 files changed** (+267 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `apps/workers/utils.ts` (+257 -24) 📝 `apps/workers/workers/crawlerWorker.ts` (+6 -2) 📝 `packages/shared/config.ts` (+4 -0) </details> ### 📄 Description Metascraper plugins like metascraperLogo make internal HTTP requests to fetch logos and metadata, but they don't respect the application's proxy settings. This caused crawling jobs to timeout when only proxy access was available. The fix temporarily overrides the global fetch function with fetchWithProxy during metadata extraction, ensuring all HTTP requests made by metascraper plugins go through the configured proxy. The original fetch function is restored in a finally block to ensure proper cleanup. This approach ensures 100% reliance on proxy settings for scraping operations, as required by the issue description. Add e2e test to verify metadata extraction with proxy settings. Fixes #1863 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 11:59:57 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1963
No description provided.