[PR #1542] 🧭 Crawler Wayback Fallback – Summary & Next Steps #1854

Open
opened 2026-03-02 11:59:29 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/karakeep-app/karakeep/pull/1542
Author: @SmolSoftBoi
Created: 6/6/2025
Status: 🔄 Open

Base: mainHead: main


📝 Commits (5)

  • d9d772a Default Wayback fallback env to true
  • 39acce5 Address review comments
  • fe9d82e Add Wayback Machine fallback with configurable env flag
  • b7f65bd Merge remote-tracking branch 'upstream/main'
  • 365f6ad Update apps/workers/workers/crawlerWorker.ts

📊 Changes

10 files changed (+141 additions, -78 deletions)

View changed files

📝 .env.sample (+2 -1)
📝 README.md (+1 -0)
📝 apps/workers/workers/crawlerWorker.ts (+64 -1)
📝 docker/docker-compose.yml (+2 -0)
📝 docs/docs/03-configuration.md (+2 -75)
📝 kubernetes/.env_sample (+2 -1)
📝 kubernetes/web-deployment.yaml (+2 -0)
packages/e2e_tests/tests/workers/crawlerWayback.test.ts (+63 -0)
📝 packages/e2e_tests/vitest.config.ts (+1 -0)
📝 packages/shared/config.ts (+2 -0)

📄 Description

What’s been done

  • Feature: Added optional Wayback Machine fallback when page crawling fails (CRAWLER_WAYBACK_FALLBACK).
  • Docs: Updated README, config guide, and example config files to document the flag.
  • Default: Fallback flag is enabled by default (true).
  • Tests: Added end-to-end test for Wayback fallback.
  • Deployment: Added env var to Docker Compose and Kubernetes configs.

  1. Release Notes / Changelog:

    • Summarise:

      • New fallback for robust crawling
      • Configurable with env var
      • Now default behaviour

🚀 Ready to support further improvements, just let me know!


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/karakeep-app/karakeep/pull/1542 **Author:** [@SmolSoftBoi](https://github.com/SmolSoftBoi) **Created:** 6/6/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (5) - [`d9d772a`](https://github.com/karakeep-app/karakeep/commit/d9d772ac4b4413407d47877894a058bd722e7c0b) Default Wayback fallback env to true - [`39acce5`](https://github.com/karakeep-app/karakeep/commit/39acce5b0245f4f636b21cd67c80fcd3d2eca58f) Address review comments - [`fe9d82e`](https://github.com/karakeep-app/karakeep/commit/fe9d82e4ed4a68b4ea25e40dfb889501a3958e33) Add Wayback Machine fallback with configurable env flag - [`b7f65bd`](https://github.com/karakeep-app/karakeep/commit/b7f65bd5ea1ac4f4ff0385d8158187163ceb276e) Merge remote-tracking branch 'upstream/main' - [`365f6ad`](https://github.com/karakeep-app/karakeep/commit/365f6ad05500c54a9285a9c75ec949f847ff0564) Update apps/workers/workers/crawlerWorker.ts ### 📊 Changes **10 files changed** (+141 additions, -78 deletions) <details> <summary>View changed files</summary> 📝 `.env.sample` (+2 -1) 📝 `README.md` (+1 -0) 📝 `apps/workers/workers/crawlerWorker.ts` (+64 -1) 📝 `docker/docker-compose.yml` (+2 -0) 📝 `docs/docs/03-configuration.md` (+2 -75) 📝 `kubernetes/.env_sample` (+2 -1) 📝 `kubernetes/web-deployment.yaml` (+2 -0) ➕ `packages/e2e_tests/tests/workers/crawlerWayback.test.ts` (+63 -0) 📝 `packages/e2e_tests/vitest.config.ts` (+1 -0) 📝 `packages/shared/config.ts` (+2 -0) </details> ### 📄 Description ## ✅ **What’s been done** * **Feature:** Added optional Wayback Machine fallback when page crawling fails (`CRAWLER_WAYBACK_FALLBACK`). * **Docs:** Updated README, config guide, and example config files to document the flag. * **Default:** Fallback flag is enabled by default (`true`). * **Tests:** Added end-to-end test for Wayback fallback. * **Deployment:** Added env var to Docker Compose and Kubernetes configs. --- ## 🛠️ **Recommended Next Steps** 1. **Release Notes / Changelog:** * Summarise: * New fallback for robust crawling * Configurable with env var * Now default behaviour --- ### 🚀 Ready to support further improvements, just let me know! --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/karakeep#1854
No description provided.