[GH-ISSUE #62] agent pre recon failed #20

Closed
opened 2026-02-27 07:20:01 +03:00 by kerem · 6 comments
Owner

Originally created by @raltheo on GitHub (Feb 6, 2026).
Original GitHub issue: https://github.com/KeygraphHQ/shannon/issues/62

I killed at 3 retry because it keep increase time but never do the pre recon

output logs :

{
  "type": "workflowExecutionFailedEventAttributes",
  "failure": {
    "message": "Activity task failed",
    "cause": {
      "message": "Agent pre-recon failed output validation after 3 attempts",
      "source": "TypeScriptSDK",
      "stackTrace": "ApplicationFailure: Agent pre-recon failed output validation after 3 attempts\n    at ApplicationFailure.nonRetryable (/app/node_modules/@temporalio/common/lib/failure.js:207:16)\n    at runAgentActivity (file:///app/dist/temporal/activities.js:156:42)\n    at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20)\n    at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16)\n    at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16)\n    at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32\n    at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30",
      "applicationFailureInfo": {
        "type": "OutputValidationError",
        "nonRetryable": true,
        "details": {
          "payloads": [
            [
              {
                "agentName": "pre-recon",
                "attemptNumber": 3,
                "elapsed": 2998
              }
            ]
          ]
        }
      }
    },
    "activityFailureInfo": {
      "scheduledEventId": "11",
      "startedEventId": "12",
      "identity": "1@61ba91ce50e3",
      "activityType": {
        "name": "runPreReconAgent"
      },
      "activityId": "2",
      "retryState": "RETRY_STATE_NON_RETRYABLE_FAILURE"
    }
  },
  "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET",
  "workflowTaskCompletedEventId": "22"
}
Originally created by @raltheo on GitHub (Feb 6, 2026). Original GitHub issue: https://github.com/KeygraphHQ/shannon/issues/62 I killed at 3 retry because it keep increase time but never do the pre recon output logs : ``` { "type": "workflowExecutionFailedEventAttributes", "failure": { "message": "Activity task failed", "cause": { "message": "Agent pre-recon failed output validation after 3 attempts", "source": "TypeScriptSDK", "stackTrace": "ApplicationFailure: Agent pre-recon failed output validation after 3 attempts\n at ApplicationFailure.nonRetryable (/app/node_modules/@temporalio/common/lib/failure.js:207:16)\n at runAgentActivity (file:///app/dist/temporal/activities.js:156:42)\n at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20)\n at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16)\n at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16)\n at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32\n at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30", "applicationFailureInfo": { "type": "OutputValidationError", "nonRetryable": true, "details": { "payloads": [ [ { "agentName": "pre-recon", "attemptNumber": 3, "elapsed": 2998 } ] ] } } }, "activityFailureInfo": { "scheduledEventId": "11", "startedEventId": "12", "identity": "1@61ba91ce50e3", "activityType": { "name": "runPreReconAgent" }, "activityId": "2", "retryState": "RETRY_STATE_NON_RETRYABLE_FAILURE" } }, "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET", "workflowTaskCompletedEventId": "22" } ```
kerem closed this issue 2026-02-27 07:20:01 +03:00
Author
Owner

@Mr-Neutr0n commented on GitHub (Feb 6, 2026):

Hi @raltheo,

This error typically occurs when the pre-recon agent doesn't create the required deliverables/code_analysis_deliverable.md file. Here are some common causes and troubleshooting steps:

Common causes:

  1. Using Router mode with incompatible models - If you're using ROUTER=true with alternative providers (OpenAI, Ollama, etc.), some models may not follow Shannon's instructions as well as Claude models
  2. Repository path issues - The repo path might not be accessible from the Docker container
  3. Target URL not reachable - For local apps, use host.docker.internal instead of localhost

Troubleshooting steps:

  1. Check agent logs at: audit-logs/*/agents/pre-recon*.jsonl
  2. If using Router mode, try with direct Anthropic API (remove ROUTER=true)
  3. For local applications, ensure the URL uses host.docker.internal:
    ./shannon start URL=http://host.docker.internal:3000 REPO=/path/to/repo
    
  4. Verify the repository path exists and is accessible

I've also submitted a PR (#68) to improve the error messaging with more detailed troubleshooting tips.

Could you share more details about your setup?

  • Are you using Router mode?
  • Is the target URL a local application?
  • What model/provider are you using?
<!-- gh-comment-id:3861874044 --> @Mr-Neutr0n commented on GitHub (Feb 6, 2026): Hi @raltheo, This error typically occurs when the pre-recon agent doesn't create the required `deliverables/code_analysis_deliverable.md` file. Here are some common causes and troubleshooting steps: **Common causes:** 1. **Using Router mode with incompatible models** - If you're using `ROUTER=true` with alternative providers (OpenAI, Ollama, etc.), some models may not follow Shannon's instructions as well as Claude models 2. **Repository path issues** - The repo path might not be accessible from the Docker container 3. **Target URL not reachable** - For local apps, use `host.docker.internal` instead of `localhost` **Troubleshooting steps:** 1. Check agent logs at: `audit-logs/*/agents/pre-recon*.jsonl` 2. If using Router mode, try with direct Anthropic API (remove `ROUTER=true`) 3. For local applications, ensure the URL uses `host.docker.internal`: ```bash ./shannon start URL=http://host.docker.internal:3000 REPO=/path/to/repo ``` 4. Verify the repository path exists and is accessible I've also submitted a PR (#68) to improve the error messaging with more detailed troubleshooting tips. Could you share more details about your setup? - Are you using Router mode? - Is the target URL a local application? - What model/provider are you using?
Author
Owner

@cliftonc commented on GitHub (Feb 8, 2026):

I had the same issue, in the UI I got an error that was hard to debug:

ApplicationFailure: Agent pre-recon failed output validation
    at ApplicationFailure.create (/app/node_modules/@temporalio/common/lib/failure.js:183:16)
    at runAgentActivity (file:///app/dist/temporal/activities.js:203:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20)
    at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16)
    at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16)
    at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32
    at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30

In the logs it was clear I forgot to top up my balance:

[2026-02-08 07:04:36] [PHASE] Starting: pre-recon
[2026-02-08 07:04:36] [AGENT] pre-recon: Starting (attempt 1)
[2026-02-08 07:04:39] [pre-recon] [LLM] Turn 1: Credit balance is too low
[2026-02-08 07:04:39] [AGENT] pre-recon: Failed - Output validation failed (3.0s $0.00)
<!-- gh-comment-id:3866354357 --> @cliftonc commented on GitHub (Feb 8, 2026): I had the same issue, in the UI I got an error that was hard to debug: ``` ApplicationFailure: Agent pre-recon failed output validation at ApplicationFailure.create (/app/node_modules/@temporalio/common/lib/failure.js:183:16) at runAgentActivity (file:///app/dist/temporal/activities.js:203:48) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20) at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16) at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16) at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32 at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30 ``` In the logs it was clear I forgot to top up my balance: ``` [2026-02-08 07:04:36] [PHASE] Starting: pre-recon [2026-02-08 07:04:36] [AGENT] pre-recon: Starting (attempt 1) [2026-02-08 07:04:39] [pre-recon] [LLM] Turn 1: Credit balance is too low [2026-02-08 07:04:39] [AGENT] pre-recon: Failed - Output validation failed (3.0s $0.00) ```
Author
Owner

@ezl-keygraph commented on GitHub (Feb 8, 2026):

Hi @raltheo, thanks for reporting

Can you please share the following details which will be helpful for us to reproduce this issue

  1. Did you use any other provider/model using ROUTER=true other than the default?
  2. How did you set the env, using .env or else using export? If you have used .env, please share the env variable names that you have used
  3. Can you take a look at a workflow.log which will be available in ./audit-logs/target-url_shannon_12345/workflow.log?
<!-- gh-comment-id:3867324873 --> @ezl-keygraph commented on GitHub (Feb 8, 2026): Hi @raltheo, thanks for reporting Can you please share the following details which will be helpful for us to reproduce this issue 1. Did you use any other provider/model using `ROUTER=true` other than the default? 2. How did you set the env, using .env or else using export? If you have used .env, please share the env variable names that you have used 3. Can you take a look at a `workflow.log` which will be available in `./audit-logs/target-url_shannon_12345/workflow.log`?
Author
Owner

@h0pes commented on GitHub (Feb 8, 2026):

Sorry to jump on this issue, but I am getting the same error (more or less). After various tweaks, I was able to proceed a bit further but then again the preRecon agent reported a failure.
Here there are some logs from the dashboard:

{
  "message": "Agent pre-recon failed output validation",
  "source": "TypeScriptSDK",
  "applicationFailureInfo": {
    "type": "OutputValidationError",
    "details": {
      "payloads": [
        {
          "agentName": "pre-recon",
          "attemptNumber": 1,
          "elapsed": 1414949
        }
      ]
    }
  }
}


ApplicationFailure: Agent pre-recon failed output validation
    at ApplicationFailure.create (/app/node_modules/@temporalio/common/lib/failure.js:183:16)
    at runAgentActivity (file:///app/dist/temporal/activities.js:203:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20)
    at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16)
    at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16)
    at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32
    at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30

{
  "agent": "pre-recon",
  "elapsedSeconds": 1413,
  "attempt": 1
}

{
  "webUrl": "https://1.2.3.4",
  "repoPath": "/target-repo",
  "workflowId": "1-2-3-4_shannon-1770564489723",
  "configPath": "./configs/my-app-config.yaml",
  "outputPath": "/app/output"
}

Shannon was started with:
./shannon start URL=https://1.2.3.4 REPO=/opt/myownrepohere CONFIG=./configs/my-app-config.yaml OUTPUT=/home/myuser/Downloads

Using .env, with Anthropic API key and max_tokens as per your README.md

Only thing I can see is in Workflow.log in report output directory this last lines:

[2026-02-08 15:41:38] [pre-recon] [LLM] Turn 296: I see there are permission constraints. Let me proceed with synthesizing the report and note the schema location in the documentation. Let me now create the comprehensive analysis report and save it using the MCP tool:
[2026-02-08 15:51:44] [pre-recon] [LLM] Turn 297: API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.
[2026-02-08 15:51:44] [AGENT] pre-recon: Failed - Output validation failed (23m 34s $6.86)
<!-- gh-comment-id:3867464503 --> @h0pes commented on GitHub (Feb 8, 2026): Sorry to jump on this issue, but I am getting the same error (more or less). After various tweaks, I was able to proceed a bit further but then again the preRecon agent reported a failure. Here there are some logs from the dashboard: ```bash { "message": "Agent pre-recon failed output validation", "source": "TypeScriptSDK", "applicationFailureInfo": { "type": "OutputValidationError", "details": { "payloads": [ { "agentName": "pre-recon", "attemptNumber": 1, "elapsed": 1414949 } ] } } } ApplicationFailure: Agent pre-recon failed output validation at ApplicationFailure.create (/app/node_modules/@temporalio/common/lib/failure.js:183:16) at runAgentActivity (file:///app/dist/temporal/activities.js:203:48) at process.processTicksAndRejections (node:internal/process/task_queues:105:5) at async Activity.execute (/app/node_modules/@temporalio/worker/lib/activity.js:101:20) at async NativeConnection.withAbortSignal (/app/node_modules/@temporalio/worker/lib/connection.js:172:16) at async Client.withAbortSignal (/app/node_modules/@temporalio/client/lib/base-client.js:65:16) at async /app/node_modules/@temporalio/worker/lib/activity.js:161:32 at async /app/node_modules/@temporalio/worker/lib/worker.js:725:30 { "agent": "pre-recon", "elapsedSeconds": 1413, "attempt": 1 } { "webUrl": "https://1.2.3.4", "repoPath": "/target-repo", "workflowId": "1-2-3-4_shannon-1770564489723", "configPath": "./configs/my-app-config.yaml", "outputPath": "/app/output" } ``` Shannon was started with: `./shannon start URL=https://1.2.3.4 REPO=/opt/myownrepohere CONFIG=./configs/my-app-config.yaml OUTPUT=/home/myuser/Downloads` Using `.env`, with Anthropic API key and `max_tokens` as per your README.md Only thing I can see is in `Workflow.log` in report output directory this last lines: ```bash [2026-02-08 15:41:38] [pre-recon] [LLM] Turn 296: I see there are permission constraints. Let me proceed with synthesizing the report and note the schema location in the documentation. Let me now create the comprehensive analysis report and save it using the MCP tool: [2026-02-08 15:51:44] [pre-recon] [LLM] Turn 297: API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable. [2026-02-08 15:51:44] [AGENT] pre-recon: Failed - Output validation failed (23m 34s $6.86) ```
Author
Owner

@ezl-keygraph commented on GitHub (Feb 8, 2026):

Thanks for providing a detailed note @h0pes. We'll try to reproduce the issue and push a fix at the earliest

<!-- gh-comment-id:3867506484 --> @ezl-keygraph commented on GitHub (Feb 8, 2026): Thanks for providing a detailed note @h0pes. We'll try to reproduce the issue and push a fix at the earliest
Author
Owner

@Yash-xoxo commented on GitHub (Feb 9, 2026):

Hey @rdhwan,

Looking at your logs, the issue is pretty clear - the agent is trying to install playwright but hitting network/permission errors during the installation process, which causes the whole pre-recon phase to fail.

Root Cause

From the error trace:

npm error code 1
npm error path /home/claude/.npm-global/lib/node_modules/playwright
npm error command failed
npm error command sh -c node install.js

The playwright post-install script is failing, probably because:

  1. Network issues downloading browser binaries (Chromium, Firefox, WebKit)
  2. Permission issues in /home/claude/.npm-global/
  3. Missing system dependencies for browser installation

Quick Fixes to Try

Option 1: Pre-install Playwright in the Docker Image

Modify the Dockerfile to include playwright with browsers already installed:

RUN npm install -g playwright && \
    npx playwright install --with-deps chromium

This way the browsers are baked into the image and don't need to download during runtime.

Option 2: Skip Playwright Browser Install

Set the PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD env var:

# In your docker-compose.yml or .env
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1

Though this might break any agents that actually need browser automation.

Option 3: Fix Network/Proxy Issues

If you're behind a proxy or firewall, playwright's download might be getting blocked. Check:

# Add to your docker run or compose file
HTTP_PROXY=your-proxy
HTTPS_PROXY=your-proxy
NO_PROXY=localhost,127.0.0.1

Option 4: Increase Timeout

Sometimes it's just a slow network. Try increasing npm timeouts:

NPM_CONFIG_TIMEOUT=300000

Workaround for Now

@izi-maguila's suggestion about using repoPath might work, but honestly this looks like an environment/network issue rather than a logic bug.

You could also try:

  1. Running with --network=host if you're in a restrictive network
  2. Checking Docker logs to see if there are DNS resolution issues
  3. Manually pre-pulling the playwright browsers before running Shannon

Let me know which approach works for you, or if you need help modifying the Dockerfile to bake playwright in!

<!-- gh-comment-id:3872124249 --> @Yash-xoxo commented on GitHub (Feb 9, 2026): Hey @rdhwan, Looking at your logs, the issue is pretty clear - the agent is trying to install `playwright` but hitting network/permission errors during the installation process, which causes the whole pre-recon phase to fail. ## Root Cause From the error trace: ``` npm error code 1 npm error path /home/claude/.npm-global/lib/node_modules/playwright npm error command failed npm error command sh -c node install.js ``` The playwright post-install script is failing, probably because: 1. Network issues downloading browser binaries (Chromium, Firefox, WebKit) 2. Permission issues in `/home/claude/.npm-global/` 3. Missing system dependencies for browser installation ## Quick Fixes to Try ### Option 1: Pre-install Playwright in the Docker Image Modify the Dockerfile to include playwright with browsers already installed: ```dockerfile RUN npm install -g playwright && \ npx playwright install --with-deps chromium ``` This way the browsers are baked into the image and don't need to download during runtime. ### Option 2: Skip Playwright Browser Install Set the `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD` env var: ```bash # In your docker-compose.yml or .env PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 ``` Though this might break any agents that actually need browser automation. ### Option 3: Fix Network/Proxy Issues If you're behind a proxy or firewall, playwright's download might be getting blocked. Check: ```bash # Add to your docker run or compose file HTTP_PROXY=your-proxy HTTPS_PROXY=your-proxy NO_PROXY=localhost,127.0.0.1 ``` ### Option 4: Increase Timeout Sometimes it's just a slow network. Try increasing npm timeouts: ```bash NPM_CONFIG_TIMEOUT=300000 ``` ## Workaround for Now @izi-maguila's suggestion about using `repoPath` might work, but honestly this looks like an environment/network issue rather than a logic bug. You could also try: 1. Running with `--network=host` if you're in a restrictive network 2. Checking Docker logs to see if there are DNS resolution issues 3. Manually pre-pulling the playwright browsers before running Shannon Let me know which approach works for you, or if you need help modifying the Dockerfile to bake playwright in!
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shannon-KeygraphHQ#20
No description provided.