[PR #51] [MERGED] Add LiteLLM callbacks and smart quota detection #55

Closed
opened 2026-03-02 04:08:04 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/gadievron/raptor/pull/51
Author: @gadievron
Created: 12/23/2025
Status: Merged
Merged: 12/26/2025
Merged by: @danielcuthbert

Base: mainHead: llm-visibility-and-quota-detection


📝 Commits (4)

  • ca8f817 Add LiteLLM callbacks and smart quota detection
  • e1124aa Fix Cursor bot issues on PR #51
  • 4eb8298 Fix inconsistent error messages between generate methods
  • 7c2d65a Fix model_config parameter being silently ignored in tests

📊 Changes

7 files changed (+1289 additions, -23 deletions)

View changed files

📝 packages/llm_analysis/llm/client.py (+354 -23)
packages/llm_analysis/tests/__init__.py (+1 -0)
packages/llm_analysis/tests/test_llm_callbacks.py (+251 -0)
packages/llm_analysis/tests/test_llm_callbacks_instructor.py (+101 -0)
packages/llm_analysis/tests/test_llm_callbacks_providers.py (+230 -0)
packages/llm_analysis/tests/test_ollama_warning.py (+158 -0)
packages/llm_analysis/tests/test_quota_detection.py (+194 -0)

📄 Description

Adds real-time model visibility with callbacks AND intelligent quota detection with simplified error messages.

Callbacks (Model Visibility)

Real-time visibility into LiteLLM model usage:

  • Add RaptorLLMLogger class with success/failure callbacks
  • Register callbacks with LiteLLM on client init
  • Log model, tokens, duration for every call
  • Sanitize API keys in error messages

Quota Detection (Simplified)

Intelligent quota/rate limit error detection:

  • Simple detection messages ("→ Provider rate limit exceeded")
  • Shows LiteLLM's actual error message (contains provider details)
  • Zero maintenance burden (no URLs/models/pricing to track)
  • API key sanitization protects sensitive data

Changes:

  • Add _is_quota_error() (type-based + string-based)
  • Add _get_quota_guidance() (simple detection messages)
  • Enhance exception handlers to show provider messages

Testing: 31/31 tests passing (100%)

  • 10 callback tests (all passing)
  • 21 quota detection tests (all passing)

Fixes: Gemini quota exhaustion issue (Dec 2025)
Breaking Changes: None


Note

Introduces real-time model visibility and clearer, safer error handling for LLM calls.

  • Callbacks/visibility: Add RaptorLLMLogger (singleton) and register with LiteLLM; logs model, tokens, duration; sanitizes errors; does not break calls on callback exceptions
  • Quota detection: New _is_quota_error() (type + string matching) and _get_quota_guidance() with concise provider messages; integrated into generate/generate_structured error paths
  • Fallback behavior: Restrict fallbacks to same tier (local↔local, cloud↔cloud); track actual attempts; add user-visible prints for model selection, retries, and cache hits
  • Safety/health checks: Require LiteLLM presence, enable litellm.redact_message_input_output_from_logging, warn when no cloud API keys, and warn when using ollama for exploit PoCs
  • API ergonomics: Allow overriding model via model_config kwarg in generate and generate_structured
  • Tests: Add suites for callbacks (registration, success/failure, retries, performance, Instructor), provider compatibility, Ollama warning content/format, and quota detection scenarios

Written by Cursor Bugbot for commit 7c2d65a08d. This will update automatically on new commits. Configure here.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/gadievron/raptor/pull/51 **Author:** [@gadievron](https://github.com/gadievron) **Created:** 12/23/2025 **Status:** ✅ Merged **Merged:** 12/26/2025 **Merged by:** [@danielcuthbert](https://github.com/danielcuthbert) **Base:** `main` ← **Head:** `llm-visibility-and-quota-detection` --- ### 📝 Commits (4) - [`ca8f817`](https://github.com/gadievron/raptor/commit/ca8f817f3a7645b2ca01c21c80413ceb7a7bde93) Add LiteLLM callbacks and smart quota detection - [`e1124aa`](https://github.com/gadievron/raptor/commit/e1124aa28cf9a78a4dbd62c3e3e7afa4fa955af2) Fix Cursor bot issues on PR #51 - [`4eb8298`](https://github.com/gadievron/raptor/commit/4eb8298cd85917ac850068211636a6231f601b3f) Fix inconsistent error messages between generate methods - [`7c2d65a`](https://github.com/gadievron/raptor/commit/7c2d65a08d26fe5260131dd4f6ffc4aa51c4b954) Fix model_config parameter being silently ignored in tests ### 📊 Changes **7 files changed** (+1289 additions, -23 deletions) <details> <summary>View changed files</summary> 📝 `packages/llm_analysis/llm/client.py` (+354 -23) ➕ `packages/llm_analysis/tests/__init__.py` (+1 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks.py` (+251 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks_instructor.py` (+101 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks_providers.py` (+230 -0) ➕ `packages/llm_analysis/tests/test_ollama_warning.py` (+158 -0) ➕ `packages/llm_analysis/tests/test_quota_detection.py` (+194 -0) </details> ### 📄 Description Adds real-time model visibility with callbacks AND intelligent quota detection with simplified error messages. ## Callbacks (Model Visibility) **Real-time visibility into LiteLLM model usage:** - Add RaptorLLMLogger class with success/failure callbacks - Register callbacks with LiteLLM on client init - Log model, tokens, duration for every call - Sanitize API keys in error messages ## Quota Detection (Simplified) **Intelligent quota/rate limit error detection:** - Simple detection messages ("→ Provider rate limit exceeded") - Shows LiteLLM's actual error message (contains provider details) - Zero maintenance burden (no URLs/models/pricing to track) - API key sanitization protects sensitive data **Changes:** - Add _is_quota_error() (type-based + string-based) - Add _get_quota_guidance() (simple detection messages) - Enhance exception handlers to show provider messages **Testing:** 31/31 tests passing (100%) - 10 callback tests (all passing) - 21 quota detection tests (all passing) **Fixes:** Gemini quota exhaustion issue (Dec 2025) **Breaking Changes:** None <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces real-time model visibility and clearer, safer error handling for LLM calls. > > - **Callbacks/visibility:** Add `RaptorLLMLogger` (singleton) and register with LiteLLM; logs model, tokens, duration; sanitizes errors; does not break calls on callback exceptions > - **Quota detection:** New `_is_quota_error()` (type + string matching) and `_get_quota_guidance()` with concise provider messages; integrated into `generate`/`generate_structured` error paths > - **Fallback behavior:** Restrict fallbacks to same tier (local↔local, cloud↔cloud); track actual attempts; add user-visible prints for model selection, retries, and cache hits > - **Safety/health checks:** Require LiteLLM presence, enable `litellm.redact_message_input_output_from_logging`, warn when no cloud API keys, and warn when using `ollama` for exploit PoCs > - **API ergonomics:** Allow overriding model via `model_config` kwarg in `generate` and `generate_structured` > - **Tests:** Add suites for callbacks (registration, success/failure, retries, performance, Instructor), provider compatibility, Ollama warning content/format, and quota detection scenarios > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 7c2d65a08d26fe5260131dd4f6ffc4aa51c4b954. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 04:08:04 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/raptor#55
No description provided.