[GH-ISSUE #6] semantic_code_search fails with 'input length exceeds context length' on large codebases #3

Closed
opened 2026-03-03 12:01:26 +03:00 by kerem · 1 comment
Owner

Originally created by @alecmarcus on GitHub (Mar 1, 2026).
Original GitHub issue: https://github.com/ForLoopCodes/contextplus/issues/6

Originally assigned to: @ForLoopCodes on GitHub.

Description

semantic_code_search consistently fails with the error the input length exceeds the context length on codebases with 100+ files, regardless of query length or top_k parameter.

Environment

  • Context+ installed via bunx contextplus
  • Ollama embedding models tested:
    • nomic-embed-text (8K context, 768d)
    • qwen3-embedding:8b (40K context, 4096d)
  • Both models work correctly when called directly via Ollama API (/api/embed)
  • semantic_identifier_search works fine on the same codebase with the same model
  • Codebase: ~130 source files across Rust, Swift, Kotlin, Python, TypeScript

Reproduction

  1. Set up Context+ MCP on a codebase with 100+ files
  2. Call semantic_code_search with any query (even a single word like "DID") and top_k: 1
  3. Error: the input length exceeds the context length

Analysis

The error originates from Ollama, not from the model's actual context limit. Testing qwen3-embedding:8b (40K context) directly via curl works fine for individual embeddings. The issue appears to be in semantic-search.ts — likely batching too many file contents into a single embedding call, or concatenating file content before embedding rather than embedding files individually.

semantic_identifier_search (which embeds shorter symbol signatures rather than full file contents) works correctly on the same codebase with the same model, which supports the hypothesis that it's a file-content batching issue.

Expected Behavior

semantic_code_search should embed files individually (or in smaller chunks) and work on codebases of any size, bounded only by the embedding model's per-document context window.

Workaround

Use semantic_identifier_search or external grep for code search.

Originally created by @alecmarcus on GitHub (Mar 1, 2026). Original GitHub issue: https://github.com/ForLoopCodes/contextplus/issues/6 Originally assigned to: @ForLoopCodes on GitHub. ## Description `semantic_code_search` consistently fails with the error `the input length exceeds the context length` on codebases with 100+ files, regardless of query length or `top_k` parameter. ## Environment - Context+ installed via `bunx contextplus` - Ollama embedding models tested: - `nomic-embed-text` (8K context, 768d) - `qwen3-embedding:8b` (40K context, 4096d) - Both models work correctly when called directly via Ollama API (`/api/embed`) - `semantic_identifier_search` works fine on the same codebase with the same model - Codebase: ~130 source files across Rust, Swift, Kotlin, Python, TypeScript ## Reproduction 1. Set up Context+ MCP on a codebase with 100+ files 2. Call `semantic_code_search` with any query (even a single word like `"DID"`) and `top_k: 1` 3. Error: `the input length exceeds the context length` ## Analysis The error originates from Ollama, not from the model's actual context limit. Testing `qwen3-embedding:8b` (40K context) directly via `curl` works fine for individual embeddings. The issue appears to be in `semantic-search.ts` — likely batching too many file contents into a single embedding call, or concatenating file content before embedding rather than embedding files individually. `semantic_identifier_search` (which embeds shorter symbol signatures rather than full file contents) works correctly on the same codebase with the same model, which supports the hypothesis that it's a file-content batching issue. ## Expected Behavior `semantic_code_search` should embed files individually (or in smaller chunks) and work on codebases of any size, bounded only by the embedding model's per-document context window. ## Workaround Use `semantic_identifier_search` or external grep for code search.
kerem 2026-03-03 12:01:26 +03:00
Author
Owner

@ForLoopCodes commented on GitHub (Mar 1, 2026):

fixed in contextplus@1.0.3 02580c0

<!-- gh-comment-id:3979461394 --> @ForLoopCodes commented on GitHub (Mar 1, 2026): fixed in contextplus@1.0.3 [02580c0](https://github.com/ForLoopCodes/contextplus/commit/02580c0ec3595d56186b36ed62731fc445471ef8)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/contextplus#3
No description provided.