[GH-ISSUE #10] Vision mode not triggered when images selected in Claude CLI flow #7

Closed
opened 2026-03-13 14:17:53 +03:00 by kerem · 1 comment
Owner

Originally created by @KlementMultiverse on GitHub (Mar 6, 2026).
Original GitHub issue: https://github.com/7836246/cursor2api/issues/10

Problem

When using claude CLI with image selection, the vision preprocessing is skipped even though vision.enabled: true in config.yaml. Images are passed directly to Cursor API without OCR/vision processing, causing:

  1. The image handling path in src/openai-handler.ts doesn't detect that CLI requests contain images
  2. No vision mode logic executes before sending to Cursor
  3. Related to #8 — users report images selected in CLI are ignored

Root Cause

The Anthropic Messages API flow (used by claude CLI) sends images in the content array as ImageBlockParam objects. The current vision preprocessing in converter.ts only processes OpenAI-style image objects (with url or base64 fields in specific locations), not Anthropic-style image blocks.

src/index.ts routes /v1/messages requests directly to the converter without checking for image content first. The vision check should happen before protocol conversion, but currently happens only in openai-handler.ts (post-conversion).

Expected Behavior

When Claude CLI sends a request with:

{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}},
      {"type": "text", "text": "analyze this image"}
    ]
  }]
}

The system should:

  1. Detect image blocks in messages[].content[]
  2. Extract base64 data before calling Cursor API
  3. Run OCR or vision API (per vision.mode config)
  4. Replace image blocks with text description in the prompt
  5. Send text-only request to Cursor, inject vision results into system prompt

Why This Matters

The vision feature (v2.3.0) is only functional for OpenAI clients (ChatBox, LobeChat) but broken for the primary use case: Claude CLI integration with Claude Code. This defeats the purpose of image support in a Claude-focused proxy.

Solution Scope

Add a preprocessImages() function in converter.ts that:

  • Detects ImageBlockParam objects in Anthropic message format
  • Extracts and processes images before cursor-client.ts makes the API call
  • Handles both OCR and external vision API modes
  • Returns modified messages with vision results injected

Call this in the Anthropic message handler before converting to Cursor format.


Contributed by Klement Gunndu

Originally created by @KlementMultiverse on GitHub (Mar 6, 2026). Original GitHub issue: https://github.com/7836246/cursor2api/issues/10 ## Problem When using `claude` CLI with image selection, the vision preprocessing is skipped even though `vision.enabled: true` in config.yaml. Images are passed directly to Cursor API without OCR/vision processing, causing: 1. The image handling path in `src/openai-handler.ts` doesn't detect that CLI requests contain images 2. No vision mode logic executes before sending to Cursor 3. Related to #8 — users report images selected in CLI are ignored ## Root Cause The Anthropic Messages API flow (used by `claude` CLI) sends images in the `content` array as `ImageBlockParam` objects. The current vision preprocessing in `converter.ts` only processes OpenAI-style image objects (with `url` or `base64` fields in specific locations), not Anthropic-style image blocks. `src/index.ts` routes `/v1/messages` requests directly to the converter without checking for image content first. The vision check should happen before protocol conversion, but currently happens only in `openai-handler.ts` (post-conversion). ## Expected Behavior When Claude CLI sends a request with: ```json { "messages": [{ "role": "user", "content": [ {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}, {"type": "text", "text": "analyze this image"} ] }] } ``` The system should: 1. Detect image blocks in `messages[].content[]` 2. Extract base64 data before calling Cursor API 3. Run OCR or vision API (per `vision.mode` config) 4. Replace image blocks with text description in the prompt 5. Send text-only request to Cursor, inject vision results into system prompt ## Why This Matters The vision feature (v2.3.0) is only functional for OpenAI clients (ChatBox, LobeChat) but broken for the primary use case: Claude CLI integration with Claude Code. This defeats the purpose of image support in a Claude-focused proxy. ## Solution Scope Add a `preprocessImages()` function in `converter.ts` that: - Detects `ImageBlockParam` objects in Anthropic message format - Extracts and processes images before cursor-client.ts makes the API call - Handles both OCR and external vision API modes - Returns modified messages with vision results injected Call this in the Anthropic message handler before converting to Cursor format. --- *Contributed by [Klement Gunndu](https://github.com/KlementMultiverse)*
kerem closed this issue 2026-03-13 14:17:58 +03:00
Author
Owner

@7836246 commented on GitHub (Mar 6, 2026):

Fixed in v2.3.2
Thanks for reporting this! You're absolutely right — the vision preprocessing was only wired up for OpenAI-style image payloads and completely missed the Anthropic ImageBlockParam format used by Claude CLI / Claude Code.

<!-- gh-comment-id:4009953937 --> @7836246 commented on GitHub (Mar 6, 2026): Fixed in v2.3.2 Thanks for reporting this! You're absolutely right — the vision preprocessing was only wired up for OpenAI-style image payloads and completely missed the Anthropic ImageBlockParam format used by Claude CLI / Claude Code.
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/cursor2api#7
No description provided.