[PR #51] [MERGED] Add LiteLLM callbacks and smart quota detection #55

New issue

Closed

opened 2026-03-02 04:08:04 +03:00 by kerem · 0 comments

kerem commented

2026-03-02 04:08:04 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/gadievron/raptor/pull/51
Author: @gadievron
Created: 12/23/2025
Status: ✅ Merged
Merged: 12/26/2025
Merged by: @danielcuthbert

Base: main ← Head: llm-visibility-and-quota-detection

📝 Commits (4)

ca8f817 Add LiteLLM callbacks and smart quota detection
e1124aa Fix Cursor bot issues on PR #51
4eb8298 Fix inconsistent error messages between generate methods
7c2d65a Fix model_config parameter being silently ignored in tests

📊 Changes

7 files changed (+1289 additions, -23 deletions)

View changed files

📝 packages/llm_analysis/llm/client.py (+354 -23)
➕ packages/llm_analysis/tests/__init__.py (+1 -0)
➕ packages/llm_analysis/tests/test_llm_callbacks.py (+251 -0)
➕ packages/llm_analysis/tests/test_llm_callbacks_instructor.py (+101 -0)
➕ packages/llm_analysis/tests/test_llm_callbacks_providers.py (+230 -0)
➕ packages/llm_analysis/tests/test_ollama_warning.py (+158 -0)
➕ packages/llm_analysis/tests/test_quota_detection.py (+194 -0)

📄 Description

Adds real-time model visibility with callbacks AND intelligent quota detection with simplified error messages.

Callbacks (Model Visibility)

Real-time visibility into LiteLLM model usage:

Add RaptorLLMLogger class with success/failure callbacks
Register callbacks with LiteLLM on client init
Log model, tokens, duration for every call
Sanitize API keys in error messages

Quota Detection (Simplified)

Intelligent quota/rate limit error detection:

Simple detection messages ("→ Provider rate limit exceeded")
Shows LiteLLM's actual error message (contains provider details)
Zero maintenance burden (no URLs/models/pricing to track)
API key sanitization protects sensitive data

Changes:

Add _is_quota_error() (type-based + string-based)
Add _get_quota_guidance() (simple detection messages)
Enhance exception handlers to show provider messages

Testing: 31/31 tests passing (100%)

10 callback tests (all passing)
21 quota detection tests (all passing)

Fixes: Gemini quota exhaustion issue (Dec 2025)
Breaking Changes: None

Note

Introduces real-time model visibility and clearer, safer error handling for LLM calls.

Callbacks/visibility: Add RaptorLLMLogger (singleton) and register with LiteLLM; logs model, tokens, duration; sanitizes errors; does not break calls on callback exceptions

Quota detection: New _is_quota_error() (type + string matching) and _get_quota_guidance() with concise provider messages; integrated into generate/generate_structured error paths

Fallback behavior: Restrict fallbacks to same tier (local↔local, cloud↔cloud); track actual attempts; add user-visible prints for model selection, retries, and cache hits

Safety/health checks: Require LiteLLM presence, enable litellm.redact_message_input_output_from_logging, warn when no cloud API keys, and warn when using ollama for exploit PoCs

API ergonomics: Allow overriding model via model_config kwarg in generate and generate_structured

Tests: Add suites for callbacks (registration, success/failure, retries, performance, Instructor), provider compatibility, Ollama warning content/format, and quota detection scenarios

^{Written by Cursor Bugbot for commit 7c2d65a08d. This will update automatically on new commits. Configure here.}

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/gadievron/raptor/pull/51 **Author:** [@gadievron](https://github.com/gadievron) **Created:** 12/23/2025 **Status:** ✅ Merged **Merged:** 12/26/2025 **Merged by:** [@danielcuthbert](https://github.com/danielcuthbert) **Base:** `main` ← **Head:** `llm-visibility-and-quota-detection` --- ### 📝 Commits (4) - [`ca8f817`](https://github.com/gadievron/raptor/commit/ca8f817f3a7645b2ca01c21c80413ceb7a7bde93) Add LiteLLM callbacks and smart quota detection - [`e1124aa`](https://github.com/gadievron/raptor/commit/e1124aa28cf9a78a4dbd62c3e3e7afa4fa955af2) Fix Cursor bot issues on PR #51 - [`4eb8298`](https://github.com/gadievron/raptor/commit/4eb8298cd85917ac850068211636a6231f601b3f) Fix inconsistent error messages between generate methods - [`7c2d65a`](https://github.com/gadievron/raptor/commit/7c2d65a08d26fe5260131dd4f6ffc4aa51c4b954) Fix model_config parameter being silently ignored in tests ### 📊 Changes **7 files changed** (+1289 additions, -23 deletions) <details> <summary>View changed files</summary> 📝 `packages/llm_analysis/llm/client.py` (+354 -23) ➕ `packages/llm_analysis/tests/__init__.py` (+1 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks.py` (+251 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks_instructor.py` (+101 -0) ➕ `packages/llm_analysis/tests/test_llm_callbacks_providers.py` (+230 -0) ➕ `packages/llm_analysis/tests/test_ollama_warning.py` (+158 -0) ➕ `packages/llm_analysis/tests/test_quota_detection.py` (+194 -0) </details> ### 📄 Description Adds real-time model visibility with callbacks AND intelligent quota detection with simplified error messages. ## Callbacks (Model Visibility) **Real-time visibility into LiteLLM model usage:** - Add RaptorLLMLogger class with success/failure callbacks - Register callbacks with LiteLLM on client init - Log model, tokens, duration for every call - Sanitize API keys in error messages ## Quota Detection (Simplified) **Intelligent quota/rate limit error detection:** - Simple detection messages ("→ Provider rate limit exceeded") - Shows LiteLLM's actual error message (contains provider details) - Zero maintenance burden (no URLs/models/pricing to track) - API key sanitization protects sensitive data **Changes:** - Add _is_quota_error() (type-based + string-based) - Add _get_quota_guidance() (simple detection messages) - Enhance exception handlers to show provider messages **Testing:** 31/31 tests passing (100%) - 10 callback tests (all passing) - 21 quota detection tests (all passing) **Fixes:** Gemini quota exhaustion issue (Dec 2025) **Breaking Changes:** None  --- > [!NOTE] > Introduces real-time model visibility and clearer, safer error handling for LLM calls. > > - **Callbacks/visibility:** Add `RaptorLLMLogger` (singleton) and register with LiteLLM; logs model, tokens, duration; sanitizes errors; does not break calls on callback exceptions > - **Quota detection:** New `_is_quota_error()` (type + string matching) and `_get_quota_guidance()` with concise provider messages; integrated into `generate`/`generate_structured` error paths > - **Fallback behavior:** Restrict fallbacks to same tier (local↔local, cloud↔cloud); track actual attempts; add user-visible prints for model selection, retries, and cache hits > - **Safety/health checks:** Require LiteLLM presence, enable `litellm.redact_message_input_output_from_logging`, warn when no cloud API keys, and warn when using `ollama` for exploit PoCs > - **API ergonomics:** Allow overriding model via `model_config` kwarg in `generate` and `generate_structured` > - **Tests:** Add suites for callbacks (registration, success/failure, retries, performance, Instructor), provider compatibility, Ollama warning content/format, and quota detection scenarios > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 7c2d65a08d26fe5260131dd4f6ffc4aa51c4b954. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>