[PR #1671] [CLOSED] abx-plugin-title: use CURL_USER_AGENT when downloading page #4474

New issue

Closed

opened 2026-03-15 01:46:36 +03:00 by kerem · 0 comments

kerem commented

2026-03-15 01:46:36 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1671
Author: @hydrargyrum
Created: 3/31/2025
Status: ❌ Closed

Base: dev ← Head: dev

📝 Commits (1)

d98663e abx-plugin-title: use CURL_USER_AGENT when downloading page

📊 Changes

2 files changed (+3 additions, -3 deletions)

View changed files

📝 archivebox/misc/util.py (+2 -2)
📝 archivebox/pkgs/abx-plugin-title/abx_plugin_title/extractor.py (+1 -1)

📄 Description

abx-plugin-title fetches the title with whatever works first:

reuse already downloaded page with dom/singlepage/wget
download the page with python-requests

the plugin documents/returns a curl command-line but it's never used. See https://github.com/ArchiveBox/ArchiveBox/issues/1670

At least we could mimick curl behaviour when downloading the page with python-requests by using the CURL_USER_AGENT setting.

Summary

Changes these areas

Bugfixes
Feature behavior
Command line interface
Configuration options
Internal architecture
Snapshot data layout on disk

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1671 **Author:** [@hydrargyrum](https://github.com/hydrargyrum) **Created:** 3/31/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `dev` --- ### 📝 Commits (1) - [`d98663e`](https://github.com/ArchiveBox/ArchiveBox/commit/d98663e355acc819a9cae97f169e00c7f16b521c) abx-plugin-title: use CURL_USER_AGENT when downloading page ### 📊 Changes **2 files changed** (+3 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `archivebox/misc/util.py` (+2 -2) 📝 `archivebox/pkgs/abx-plugin-title/abx_plugin_title/extractor.py` (+1 -1) </details> ### 📄 Description abx-plugin-title fetches the title with whatever works first: - reuse already downloaded page with dom/singlepage/wget - download the page with python-requests the plugin documents/returns a curl command-line but it's never used. See https://github.com/ArchiveBox/ArchiveBox/issues/1670 At least we could mimick curl behaviour when downloading the page with python-requests by using the CURL_USER_AGENT setting.  # Summary  # Related issues  # Changes these areas - [x] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

kerem

2026-03-15 01:46:36 +03:00

closed this issue
added the
pull-request
label

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/ArchiveBox#4474

No description provided.

Rows
Columns