[PR #419] [MERGED] Hugging Face provider: remove env side-effects, add client cache, and tune retries #724

Closed
opened 2026-03-13 21:05:53 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/AJaySi/ALwrity/pull/419
Author: @AJaySi
Created: 3/12/2026
Status: Merged
Merged: 3/12/2026
Merged by: @AJaySi

Base: mainHead: codex/refactor-huggingface_provider.py-service


📝 Commits (1)

  • 7df7d87 Refine Hugging Face provider retries and client reuse

📊 Changes

1 file changed (+94 additions, -39 deletions)

View changed files

📝 backend/services/llm_providers/huggingface_provider.py (+94 -39)

📄 Description

Motivation

  • Remove import-time environment loading and noisy prints so the provider relies on centralized bootstrap and environment management.
  • Reduce per-request overhead and avoid unnecessary fixed throttling that amplified latency during multi-model fallbacks.
  • Improve resilience and observability of fallback attempts while avoiding excessively long retry amplification.

Description

  • Removed dotenv import-time .env loading and print(...) side effects so the module no longer performs environment bootstrapping on import.
  • Removed unconditional time.sleep(1) throttling from both text and structured request paths.
  • Added a lightweight, thread-safe module-level client cache keyed by a hashed API key identifier and introduced get_huggingface_client(api_key) to reuse OpenAI/HF clients instead of reconstructing them per request.
  • Tuned tenacity retry policy from 6 long-backoff attempts to 3 attempts with a shorter exponential backoff and enabled reraise=True to avoid hidden latency amplification across fallback attempts.
  • Added debug logs that record fallback attempt counts and elapsed milliseconds per attempt (including model-not-found paths and the response_format fallback) to aid performance troubleshooting.

Testing

  • Ran python -m py_compile backend/services/llm_providers/huggingface_provider.py which completed successfully.

Codex Task


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/AJaySi/ALwrity/pull/419 **Author:** [@AJaySi](https://github.com/AJaySi) **Created:** 3/12/2026 **Status:** ✅ Merged **Merged:** 3/12/2026 **Merged by:** [@AJaySi](https://github.com/AJaySi) **Base:** `main` ← **Head:** `codex/refactor-huggingface_provider.py-service` --- ### 📝 Commits (1) - [`7df7d87`](https://github.com/AJaySi/ALwrity/commit/7df7d870e5077b2fcec28c977d45890a53d278e9) Refine Hugging Face provider retries and client reuse ### 📊 Changes **1 file changed** (+94 additions, -39 deletions) <details> <summary>View changed files</summary> 📝 `backend/services/llm_providers/huggingface_provider.py` (+94 -39) </details> ### 📄 Description ### Motivation - Remove import-time environment loading and noisy prints so the provider relies on centralized bootstrap and environment management. - Reduce per-request overhead and avoid unnecessary fixed throttling that amplified latency during multi-model fallbacks. - Improve resilience and observability of fallback attempts while avoiding excessively long retry amplification. ### Description - Removed `dotenv` import-time `.env` loading and `print(...)` side effects so the module no longer performs environment bootstrapping on import. - Removed unconditional `time.sleep(1)` throttling from both text and structured request paths. - Added a lightweight, thread-safe module-level client cache keyed by a hashed API key identifier and introduced `get_huggingface_client(api_key)` to reuse OpenAI/HF clients instead of reconstructing them per request. - Tuned tenacity retry policy from 6 long-backoff attempts to 3 attempts with a shorter exponential backoff and enabled `reraise=True` to avoid hidden latency amplification across fallback attempts. - Added debug logs that record fallback attempt counts and elapsed milliseconds per attempt (including model-not-found paths and the `response_format` fallback) to aid performance troubleshooting. ### Testing - Ran `python -m py_compile backend/services/llm_providers/huggingface_provider.py` which completed successfully. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_69b0db61e5cc83288b87d42af975b303) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-13 21:05:53 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ALwrity#724
No description provided.