[PR #2108] [MERGED] fix: Fix Amazon product image extraction on amazon.com URLs #1980

Closed
opened 2026-03-02 12:00:02 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/2108
Author: @Yeraze
Created: 11/9/2025
Status: Merged
Merged: 12/14/2025
Merged by: @MohamedBassem

Base: mainHead: fix/amazon-image-extraction


📝 Commits (3)

  • 271d73d fix: Fix Amazon product image extraction on amazon.com URLs
  • 3a336f9 Merge branch 'main' into fix/amazon-image-extraction
  • a4922e9 Merge branch 'main' into fix/amazon-image-extraction

📊 Changes

2 files changed (+79 additions, -0 deletions)

View changed files

apps/workers/metascraper-plugins/metascraper-amazon-improved.ts (+77 -0)
📝 apps/workers/workers/crawlerWorker.ts (+2 -0)

📄 Description

The metascraper-amazon package extracts the first .a-dynamic-image element, which on amazon.com is often the Prime logo instead of the product image. This works fine on amazon.co.uk where the product image appears first in the DOM.

Created a custom metascraper plugin that uses more specific selectors (#landingImage, #imgTagWrapperId, #imageBlock) to target the actual product image. By placing this plugin before metascraperAmazon() in the chain, we fix image extraction while preserving all other Amazon metadata (title, brand, description).

This is to fix https://github.com/karakeep-app/karakeep/issues/2075

Tested with the following URL's to validate success:

🤖 Generated with help from Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/2108 **Author:** [@Yeraze](https://github.com/Yeraze) **Created:** 11/9/2025 **Status:** ✅ Merged **Merged:** 12/14/2025 **Merged by:** [@MohamedBassem](https://github.com/MohamedBassem) **Base:** `main` ← **Head:** `fix/amazon-image-extraction` --- ### 📝 Commits (3) - [`271d73d`](https://github.com/karakeep-app/karakeep/commit/271d73d57257db16569f1caaffdc3d059f12755b) fix: Fix Amazon product image extraction on amazon.com URLs - [`3a336f9`](https://github.com/karakeep-app/karakeep/commit/3a336f9c9e9b00b2ed8acc247e5d087c9d2f1cc5) Merge branch 'main' into fix/amazon-image-extraction - [`a4922e9`](https://github.com/karakeep-app/karakeep/commit/a4922e9edacc84d7f18923e8068b9177356f4e9b) Merge branch 'main' into fix/amazon-image-extraction ### 📊 Changes **2 files changed** (+79 additions, -0 deletions) <details> <summary>View changed files</summary> ➕ `apps/workers/metascraper-plugins/metascraper-amazon-improved.ts` (+77 -0) 📝 `apps/workers/workers/crawlerWorker.ts` (+2 -0) </details> ### 📄 Description The metascraper-amazon package extracts the first .a-dynamic-image element, which on amazon.com is often the Prime logo instead of the product image. This works fine on amazon.co.uk where the product image appears first in the DOM. Created a custom metascraper plugin that uses more specific selectors (#landingImage, #imgTagWrapperId, #imageBlock) to target the actual product image. By placing this plugin before metascraperAmazon() in the chain, we fix image extraction while preserving all other Amazon metadata (title, brand, description). This is to fix https://github.com/karakeep-app/karakeep/issues/2075 Tested with the following URL's to validate success: * https://www.amazon.co.uk/Philips-LED-GU10-Light-Bulbs/dp/B01KHILJ5O * https://www.amazon.com/dp/B099RSLNZ4?ref=ppx_yo2ov_dt_b_fed_asin_title&th=1 🤖 Generated with help from [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 12:00:02 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1980
No description provided.