[PR #48] [MERGED] Add defensive handling for LLM response field validation #52

New issue

Closed

opened 2026-03-02 04:08:02 +03:00 by kerem · 0 comments

kerem commented

2026-03-02 04:08:02 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/gadievron/raptor/pull/48
Author: @gadievron
Created: 12/22/2025
Status: ✅ Merged
Merged: 12/26/2025
Merged by: @danielcuthbert

Base: main ← Head: fix/bug-42-sanitizer-attributeerror

📝 Commits (1)

5e3b21c Add defensive handling for LLM response field validation

📊 Changes

1 file changed (+13 additions, -1 deletions)

View changed files

📝 packages/codeql/autonomous_analyzer.py (+13 -1)

📄 Description

Summary

Adds robust error handling for variable LLM response structures to prevent AttributeError when unexpected fields are returned during autonomous vulnerability assessment.

Problem

LLM can return various response types depending on Instructor configuration:

Dict (no attributes)
Pydantic model (with all fields)
Pydantic model (with extra fields not in schema)
Custom structure (may lack expected fields)

When response structure includes unexpected fields like sanitizers that aren't part of the VulnerabilityAnalysis schema, creating the dataclass can fail or lead to AttributeErrors.

Root Cause

Instructor library can return different response types based on:

Model capabilities
Schema complexity
Pydantic configuration
Fallback behavior

Current code assumes response_dict exactly matches VulnerabilityAnalysis schema.

Changes

File: `packages/codeql/autonomous_analyzer.py`

Lines: 290-302

Added defensive field filtering:

Filter LLM response_dict to only include valid VulnerabilityAnalysis fields
Log unexpected fields for debugging (e.g., 'sanitizers' not in schema)
Prevent TypeError from extra fields in LLM responses

# Filter to only valid fields
valid_fields = {f.name for f in VulnerabilityAnalysis.__dataclass_fields__.values()}
filtered_response = {k: v for k, v in response_dict.items() if k in valid_fields}

# Log unexpected fields
unexpected_fields = set(response_dict.keys()) - valid_fields
if unexpected_fields:
    self.logger.debug(f"LLM response included unexpected fields (ignored): {unexpected_fields}")

analysis = VulnerabilityAnalysis(**filtered_response)

Why This Fix is Correct

Defensive Programming ✅

Handles unexpected response fields gracefully
Provides clear logging for debugging
Fails gracefully instead of crashing
Uses Python best practice (field filtering)

Design Philosophy ✅

Aligns with RAPTOR's "defense-in-depth" approach:

External systems (LLMs) are unpredictable
Validate all external data
Degrade gracefully on unexpected input
Log warnings for investigation

Type of Change

Bug fix (non-breaking change which fixes an issue)
Enhancement (improves error handling)

Impact

Risk: Very Low - Defensive addition, no existing behavior changed
Scope: Single LLM response handling point
Breaking: No - Only adds safety, doesn't remove functionality

Fixes #42

Note

Defensive LLM response handling

Filters response_dict to only include VulnerabilityAnalysis dataclass fields before instantiation to avoid TypeError from extra keys

Logs unexpected LLM response fields (e.g., sanitizers) at debug level for diagnostics

Affects packages/codeql/autonomous_analyzer.py in analyze_vulnerability; no functional changes elsewhere.

^{Written by Cursor Bugbot for commit 5e3b21c8c6. This will update automatically on new commits. Configure here.}

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/gadievron/raptor/pull/48 **Author:** [@gadievron](https://github.com/gadievron) **Created:** 12/22/2025 **Status:** ✅ Merged **Merged:** 12/26/2025 **Merged by:** [@danielcuthbert](https://github.com/danielcuthbert) **Base:** `main` ← **Head:** `fix/bug-42-sanitizer-attributeerror` --- ### 📝 Commits (1) - [`5e3b21c`](https://github.com/gadievron/raptor/commit/5e3b21c8c6812cb2865f3b092b5ef2f54a2fe60d) Add defensive handling for LLM response field validation ### 📊 Changes **1 file changed** (+13 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `packages/codeql/autonomous_analyzer.py` (+13 -1) </details> ### 📄 Description ## Summary Adds robust error handling for variable LLM response structures to prevent AttributeError when unexpected fields are returned during autonomous vulnerability assessment. ## Problem LLM can return various response types depending on Instructor configuration: - Dict (no attributes) - Pydantic model (with all fields) - Pydantic model (with extra fields not in schema) - Custom structure (may lack expected fields) When response structure includes unexpected fields like `sanitizers` that aren't part of the VulnerabilityAnalysis schema, creating the dataclass can fail or lead to AttributeErrors. ## Root Cause Instructor library can return different response types based on: - Model capabilities - Schema complexity - Pydantic configuration - Fallback behavior Current code assumes response_dict exactly matches VulnerabilityAnalysis schema. ## Changes ### File: `packages/codeql/autonomous_analyzer.py` **Lines:** 290-302 Added defensive field filtering: - Filter LLM response_dict to only include valid VulnerabilityAnalysis fields - Log unexpected fields for debugging (e.g., 'sanitizers' not in schema) - Prevent TypeError from extra fields in LLM responses ```python # Filter to only valid fields valid_fields = {f.name for f in VulnerabilityAnalysis.__dataclass_fields__.values()} filtered_response = {k: v for k, v in response_dict.items() if k in valid_fields} # Log unexpected fields unexpected_fields = set(response_dict.keys()) - valid_fields if unexpected_fields: self.logger.debug(f"LLM response included unexpected fields (ignored): {unexpected_fields}") analysis = VulnerabilityAnalysis(**filtered_response) ``` ## Why This Fix is Correct ### Defensive Programming ✅ - Handles unexpected response fields gracefully - Provides clear logging for debugging - Fails gracefully instead of crashing - Uses Python best practice (field filtering) ### Design Philosophy ✅ Aligns with RAPTOR's "defense-in-depth" approach: - External systems (LLMs) are unpredictable - Validate all external data - Degrade gracefully on unexpected input - Log warnings for investigation ## Type of Change - [x] Bug fix (non-breaking change which fixes an issue) - [x] Enhancement (improves error handling) ## Impact - **Risk:** Very Low - Defensive addition, no existing behavior changed - **Scope:** Single LLM response handling point - **Breaking:** No - Only adds safety, doesn't remove functionality Fixes #42  --- > [!NOTE] > **Defensive LLM response handling** > > - Filters `response_dict` to only include `VulnerabilityAnalysis` dataclass fields before instantiation to avoid `TypeError` from extra keys > - Logs unexpected LLM response fields (e.g., `sanitizers`) at debug level for diagnostics > > Affects `packages/codeql/autonomous_analyzer.py` in `analyze_vulnerability`; no functional changes elsewhere. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 5e3b21c8c6812cb2865f3b092b5ef2f54a2fe60d. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>