mirror of
https://github.com/KeygraphHQ/shannon.git
synced 2026-04-25 17:45:53 +03:00
[GH-ISSUE #105] Pre-recon agent stalls: Temporal heartbeat timeout during sub-agent execution #36
Labels
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/shannon-KeygraphHQ#36
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @lenkaiser on GitHub (Feb 9, 2026).
Original GitHub issue: https://github.com/KeygraphHQ/shannon/issues/105
Description
When running Shannon against a real codebase (Next.js + Express app, ~50 source files) using
CLAUDE_CODE_OAUTH_TOKEN, therunPreReconAgentTemporal activity stalls and never completes. The pipeline testing mode (PIPELINE_TESTING=true) works fine end-to-end.Environment
CLAUDE_CODE_OAUTH_TOKEN(team subscription,claude_max_5xrate tier)http://host.docker.internal:3000Steps to Reproduce
Observed Behavior
workflow.logstops updating after the sub-agents start (logs are batched, not streamed during sub-agent execution)runPreReconAgentstuck on Attempt 1 of 50, with the last heartbeat 60+ minutes behindclaudeprocess inside the container is still alive and making API calls (confirmed viadocker exec ps auxand Claude debug logs showingStream started - received first chunk)Temporal UI State
Heartbeat stopped ~10 minutes in, while the Claude process continued running for much longer.
Docker Logs
Analysis
The pre-recon agent uses Claude Code's
Tasktool to spawn sub-agents for parallel analysis. While these sub-agents run, the parent agent is blocked waiting — and during this time, no Temporal heartbeats are sent. The sub-agents themselves are working fine (debug logs confirm active API streams), but Temporal's heartbeat mechanism doesn't account for the parent being blocked on child agent completion.The pre-recon prompt encourages spawning 3+ parallel sub-agents per phase, and with a real codebase each sub-agent does 100+ tool calls. This easily takes 10-20+ minutes, far exceeding what appears to be the heartbeat timeout.
Expected Behavior
The
runPreReconAgentactivity should either:Workaround Attempted
PIPELINE_TESTING=truecompletes successfully (13/13 agents, 78s, ~$0.55) but produces simulated results against example.com rather than testing the actual target.@ppamorim commented on GitHub (Feb 9, 2026):
@lenkaiser I have the same issue, alongside that Shannon is unable to correctly detect the path for the repository, it keeps printing
"repoPath": "/target-repo". The documentation is not satisfactory because it's not clear if the path represents the repo path at$HOMEor the.git. I am currently starting shannong with the command:./shannon start URL=https://foo.com REPO=$HOME/Repository/foo.@lenkaiser commented on GitHub (Feb 9, 2026):
Not sure if our problem is related however with Claude I was able to execute this change after which is continues with the pre-recon
Now I checked the logs at
cd /tmp/shannon && ./shannon logs ID=host-docker-internal_shannon-1770640616220and I see the process nicely updating (or executing the pre-recon). What is kind of misleading is that the web UI isn't showing any progress. It just says the following:@Yash-xoxo commented on GitHub (Feb 9, 2026):
Hey @terkaner and @goswamim,
I've been looking into this issue and I think I have some insights that might help resolve the heartbeat timeout problem.
Root Cause Analysis
From what I can see in the temporal UI logs, the issue is that the pre-recon agent's heartbeat is timing out after 10 minutes while waiting for sub-agents to complete. The key indicator is that ~10 minute gap in the activity state where the heartbeat stops being renewed.
Looking at the Docker logs, the sub-agents (like Code QA authenticator) are actually running and processing, but the main coordinator seems to be stuck waiting without properly maintaining the heartbeat connection.
Why This Happens
The problem appears to be a mismatch between:
Basically, while Claude is busy analyzing the codebase through the sub-agents, the main workflow isn't sending heartbeats frequently enough to keep Temporal happy.
Suggested Solution
I agree with @goswamim's approach about the
--repoPathflag being a workaround, but for a proper fix, I'd suggest:The implementation would look something like:
For the Immediate Issue
@terkaner - for your specific case with the 525-file repo timing out, you could try:
--repoPathworkaround @goswamim mentionedLet me know if you need help implementing any of these fixes or if you want me to submit a PR with the heartbeat improvements!
@ajmallesh commented on GitHub (Feb 9, 2026):
Thanks for the report! Fixed in #108.