[PR #1364] [MERGED] Use COOKIES_FILE to fetch page titles #1391

Closed
opened 2026-03-01 14:49:35 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1364
Author: @benmuth
Created: 2/27/2024
Status: Merged
Merged: 3/14/2024
Merged by: @pirate

Base: devHead: title-cookies-file


📝 Commits (5)

📊 Changes

1 file changed (+16 additions, -2 deletions)

View changed files

📝 archivebox/util.py (+16 -2)

📄 Description

Summary

This PR lets the title extractor use the COOKIES_FILE, if available. This helps avoid extracting the titles of captcha or login pages.

Related issues

#761

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1364 **Author:** [@benmuth](https://github.com/benmuth) **Created:** 2/27/2024 **Status:** ✅ Merged **Merged:** 3/14/2024 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `title-cookies-file` --- ### 📝 Commits (5) - [`68326a6`](https://github.com/ArchiveBox/ArchiveBox/commit/68326a60ee20e2a8831ae86e9867b352e0f74ca6) Add cookies file to http request in `download_url` - [`fe11e1c`](https://github.com/ArchiveBox/ArchiveBox/commit/fe11e1c2f47487b419497bac38aafbd433ed689a) check if COOKIE_FILE is file - [`a577d1e`](https://github.com/ArchiveBox/ArchiveBox/commit/a577d1ed232101275383de2c96722c08436b9f30) Merge branch 'dev' into title-cookies-file - [`4686da9`](https://github.com/ArchiveBox/ArchiveBox/commit/4686da91e6b11661c0e57397fe86886416d965d5) Fix cookies being set incorrectly - [`5082d61`](https://github.com/ArchiveBox/ArchiveBox/commit/5082d61613b3ece7c99cb0f78a5b2d7fb08d2527) Merge branch 'title-cookies-file' of https://github.com/benmuth/ArchiveBox into title-cookies-file ### 📊 Changes **1 file changed** (+16 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/util.py` (+16 -2) </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary This PR lets the title extractor use the `COOKIES_FILE`, if available. This helps avoid extracting the titles of captcha or login pages. <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues #761 <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [x] Feature behavior - [ ] Command line interface - [x] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-01 14:49:35 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1391
No description provided.