[GH-ISSUE #76] Bug: Process continues executing after Ctrl+C due to interrupt handler race condition #28

Closed
opened 2026-02-27 10:21:51 +03:00 by kerem · 1 comment
Owner

Originally created by @LuoAndOrder on GitHub (Jan 20, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/76

Issue: Process continues executing after Ctrl+C due to interrupt handler race condition

Summary

When running ralph run and sending an interrupt signal (Ctrl+C), the process continues executing and printing output to the terminal instead of terminating cleanly. This occurs because two interrupt handlers race against each other, and when the main loop's handler wins, the PTY executor's cleanup code is skipped.

Environment

  • OS: macOS (Darwin), likely affects all Unix platforms
  • Terminal: Ghostty (reproduced), likely affects all terminals
  • Mode: Both TUI (-i) and non-TUI modes affected
  • Version: 2.0.9

Steps to Reproduce

  1. Start ralph in run mode:

    ralph run -c ralph.claude.yml -p "Write a long response about software architecture"
    
  2. Wait for the agent to start producing output

  3. Press Ctrl+C to interrupt

  4. Observe: Output continues to be printed to the terminal after the interrupt

Expected Behavior

  • Process terminates immediately on Ctrl+C
  • No further output is printed after the interrupt signal
  • Terminal returns to normal state

Actual Behavior

  • Output continues to be printed for some time after Ctrl+C
  • In TUI mode, the display continues updating for ~250ms
  • The process eventually terminates but not cleanly

Root Cause Analysis

The Race Condition

Two interrupt handlers compete when Ctrl+C is pressed:

1. Main loop interrupt handler (crates/ralph-cli/src/main.rs:1852-1872):

let outcome = tokio::select! {
    result = execute_future => result?,
    _ = interrupt_rx_clone.changed() => {
        // Kills process group directly, drops execute_future
        let _ = killpg(pgid, Signal::SIGTERM);
        tokio::time::sleep(Duration::from_millis(250)).await;
        let _ = killpg(pgid, Signal::SIGKILL);
        // ... cleanup ...
        cleanup_tui(tui_handle);  // TUI aborted AFTER 250ms sleep
        return Ok(reason);
    }
};

2. PTY executor interrupt handler (crates/ralph-adapters/src/pty_executor.rs:388-395):

_ = interrupt_rx.changed() => {
    if *interrupt_rx.borrow() {
        debug!("Interrupt received in observe mode, terminating");
        termination = TerminationType::UserInterrupt;
        should_terminate.store(true, Ordering::SeqCst);  // Signals reader thread
        let _ = self.terminate_child(&mut child, true).await;
        break;
    }
}

What Happens When Main Loop Wins

When the main loop's tokio::select! picks the interrupt branch:

  1. execute_future is dropped (cancelled), not awaited
  2. The PTY executor's cleanup code never runs
  3. should_terminate flag is never set
  4. The std::thread reader (spawned at pty_executor.rs:334) continues running
  5. In TUI mode, the reader sends output via tui_output_tx channel
  6. TUI continues displaying for 250ms until cleanup_tui() is called

Why the Reader Thread Continues

The reader thread is a native OS thread (std::thread::spawn), not a tokio task. It only stops when:

  • reader.read() returns EOF (child process died)
  • blocking_send() fails (channel receiver dropped)
  • should_terminate flag is checked and true

Since the PTY executor's cleanup is skipped, the flag is never set. The reader continues its loop until the child dies from killpg() and the PTY returns EOF.

Timing Issue with TUI Cleanup

// Current order (problematic):
killpg(pgid, Signal::SIGTERM);
tokio::time::sleep(Duration::from_millis(250)).await;  // TUI still running!
killpg(pgid, Signal::SIGKILL);
cleanup_tui(tui_handle);  // Too late - TUI displayed output for 250ms

Proposed Solution

Remove the main loop's interrupt handler and let the PTY executor handle interrupts exclusively.

Before:

let outcome = tokio::select! {
    result = execute_future => result?,
    _ = interrupt_rx_clone.changed() => {
        // ... 30 lines of duplicate interrupt handling
    }
};

After:

let outcome = execute_future.await?;

Rationale

  1. Eliminates race condition: One handler means no race
  2. PTY executor is already robust: Has SIGTERM → timeout → SIGKILL escalation
  3. Reader threads get signaled: should_terminate flag is set properly
  4. Simpler architecture: Single source of truth for interrupt handling
  5. Aligns with Ralph tenets: "The orchestrator is a thin coordination layer"

Alternative: If Safety Net is Deemed Necessary

If the main loop interrupt handler is kept as a safety net, it should:

  1. Abort TUI immediately (before the 250ms sleep)
  2. Set a shared termination flag that reader threads can check
  3. Use coordinated handoff instead of racing
_ = interrupt_rx_clone.changed() => {
    // 1. Abort TUI immediately
    if let Some(handle) = tui_handle.take() {
        handle.abort();
    }

    // 2. Set shared termination flag (requires plumbing)
    shared_terminate.store(true, Ordering::SeqCst);

    // 3. Kill process group
    let _ = killpg(pgid, Signal::SIGTERM);
    tokio::time::sleep(Duration::from_millis(250)).await;
    let _ = killpg(pgid, Signal::SIGKILL);

    return Ok(TerminationReason::Interrupted);
}

Files Affected

  • crates/ralph-cli/src/main.rs - Main loop interrupt handler (lines 1850-1874)
  • crates/ralph-adapters/src/pty_executor.rs - PTY executor interrupt handling

Acceptance Criteria

  • Ctrl+C immediately stops all output
  • No race condition between interrupt handlers
  • Reader threads terminate cleanly
  • TUI exits immediately on interrupt (no 250ms delay)
  • Terminal returns to normal state
  • cargo test passes
  • cargo clippy passes
  • Manual testing confirms clean exit in all scenarios:
    • Non-TUI mode (ralph run)
    • TUI mode (ralph run -i)
    • During agent output
    • During tool execution
    • Double Ctrl+C

Test Plan

Manual Testing

# Test 1: Non-TUI mode interrupt
ralph run -c ralph.claude.yml -p "Write a detailed essay"
# Press Ctrl+C while output is streaming
# Expected: Immediate termination, no trailing output

# Test 2: TUI mode interrupt
ralph run -i -c ralph.claude.yml -p "Write a detailed essay"
# Press Ctrl+C while output is streaming
# Expected: TUI exits immediately, terminal restored

# Test 3: Stress test
for i in {1..10}; do
    timeout 5 ralph run -c ralph.claude.yml -p "Hi" &
    sleep 1
    kill -INT $!
    wait
done
echo "Terminal should be normal"
  • tasks/fix-termination-cleanup.code-task.md (completed) - Terminal state corruption on abort
  • tasks/fix-ctrl-c-freeze.code-task.md (completed) - TUI freeze after double Ctrl+C

Labels

bug, signal-handling, pty, tui

Originally created by @LuoAndOrder on GitHub (Jan 20, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/76 # Issue: Process continues executing after Ctrl+C due to interrupt handler race condition ## Summary When running `ralph run` and sending an interrupt signal (Ctrl+C), the process continues executing and printing output to the terminal instead of terminating cleanly. This occurs because two interrupt handlers race against each other, and when the main loop's handler wins, the PTY executor's cleanup code is skipped. ## Environment - **OS:** macOS (Darwin), likely affects all Unix platforms - **Terminal:** Ghostty (reproduced), likely affects all terminals - **Mode:** Both TUI (`-i`) and non-TUI modes affected - **Version:** 2.0.9 ## Steps to Reproduce 1. Start ralph in run mode: ```bash ralph run -c ralph.claude.yml -p "Write a long response about software architecture" ``` 2. Wait for the agent to start producing output 3. Press Ctrl+C to interrupt 4. **Observe:** Output continues to be printed to the terminal after the interrupt ## Expected Behavior - Process terminates immediately on Ctrl+C - No further output is printed after the interrupt signal - Terminal returns to normal state ## Actual Behavior - Output continues to be printed for some time after Ctrl+C - In TUI mode, the display continues updating for ~250ms - The process eventually terminates but not cleanly ## Root Cause Analysis ### The Race Condition Two interrupt handlers compete when Ctrl+C is pressed: **1. Main loop interrupt handler** (`crates/ralph-cli/src/main.rs:1852-1872`): ```rust let outcome = tokio::select! { result = execute_future => result?, _ = interrupt_rx_clone.changed() => { // Kills process group directly, drops execute_future let _ = killpg(pgid, Signal::SIGTERM); tokio::time::sleep(Duration::from_millis(250)).await; let _ = killpg(pgid, Signal::SIGKILL); // ... cleanup ... cleanup_tui(tui_handle); // TUI aborted AFTER 250ms sleep return Ok(reason); } }; ``` **2. PTY executor interrupt handler** (`crates/ralph-adapters/src/pty_executor.rs:388-395`): ```rust _ = interrupt_rx.changed() => { if *interrupt_rx.borrow() { debug!("Interrupt received in observe mode, terminating"); termination = TerminationType::UserInterrupt; should_terminate.store(true, Ordering::SeqCst); // Signals reader thread let _ = self.terminate_child(&mut child, true).await; break; } } ``` ### What Happens When Main Loop Wins When the main loop's `tokio::select!` picks the interrupt branch: 1. `execute_future` is **dropped** (cancelled), not awaited 2. The PTY executor's cleanup code **never runs** 3. `should_terminate` flag is **never set** 4. The `std::thread` reader (spawned at `pty_executor.rs:334`) **continues running** 5. In TUI mode, the reader sends output via `tui_output_tx` channel 6. TUI continues displaying for 250ms until `cleanup_tui()` is called ### Why the Reader Thread Continues The reader thread is a native OS thread (`std::thread::spawn`), not a tokio task. It only stops when: - `reader.read()` returns EOF (child process died) - `blocking_send()` fails (channel receiver dropped) - `should_terminate` flag is checked and true Since the PTY executor's cleanup is skipped, the flag is never set. The reader continues its loop until the child dies from `killpg()` and the PTY returns EOF. ### Timing Issue with TUI Cleanup ```rust // Current order (problematic): killpg(pgid, Signal::SIGTERM); tokio::time::sleep(Duration::from_millis(250)).await; // TUI still running! killpg(pgid, Signal::SIGKILL); cleanup_tui(tui_handle); // Too late - TUI displayed output for 250ms ``` ## Proposed Solution ### Recommended Fix: Remove Duplicate Interrupt Handler Remove the main loop's interrupt handler and let the PTY executor handle interrupts exclusively. **Before:** ```rust let outcome = tokio::select! { result = execute_future => result?, _ = interrupt_rx_clone.changed() => { // ... 30 lines of duplicate interrupt handling } }; ``` **After:** ```rust let outcome = execute_future.await?; ``` ### Rationale 1. **Eliminates race condition:** One handler means no race 2. **PTY executor is already robust:** Has SIGTERM → timeout → SIGKILL escalation 3. **Reader threads get signaled:** `should_terminate` flag is set properly 4. **Simpler architecture:** Single source of truth for interrupt handling 5. **Aligns with Ralph tenets:** "The orchestrator is a thin coordination layer" ### Alternative: If Safety Net is Deemed Necessary If the main loop interrupt handler is kept as a safety net, it should: 1. Abort TUI **immediately** (before the 250ms sleep) 2. Set a shared termination flag that reader threads can check 3. Use coordinated handoff instead of racing ```rust _ = interrupt_rx_clone.changed() => { // 1. Abort TUI immediately if let Some(handle) = tui_handle.take() { handle.abort(); } // 2. Set shared termination flag (requires plumbing) shared_terminate.store(true, Ordering::SeqCst); // 3. Kill process group let _ = killpg(pgid, Signal::SIGTERM); tokio::time::sleep(Duration::from_millis(250)).await; let _ = killpg(pgid, Signal::SIGKILL); return Ok(TerminationReason::Interrupted); } ``` ## Files Affected - `crates/ralph-cli/src/main.rs` - Main loop interrupt handler (lines 1850-1874) - `crates/ralph-adapters/src/pty_executor.rs` - PTY executor interrupt handling ## Acceptance Criteria - [ ] Ctrl+C immediately stops all output - [ ] No race condition between interrupt handlers - [ ] Reader threads terminate cleanly - [ ] TUI exits immediately on interrupt (no 250ms delay) - [ ] Terminal returns to normal state - [ ] `cargo test` passes - [ ] `cargo clippy` passes - [ ] Manual testing confirms clean exit in all scenarios: - [ ] Non-TUI mode (`ralph run`) - [ ] TUI mode (`ralph run -i`) - [ ] During agent output - [ ] During tool execution - [ ] Double Ctrl+C ## Test Plan ### Manual Testing ```bash # Test 1: Non-TUI mode interrupt ralph run -c ralph.claude.yml -p "Write a detailed essay" # Press Ctrl+C while output is streaming # Expected: Immediate termination, no trailing output # Test 2: TUI mode interrupt ralph run -i -c ralph.claude.yml -p "Write a detailed essay" # Press Ctrl+C while output is streaming # Expected: TUI exits immediately, terminal restored # Test 3: Stress test for i in {1..10}; do timeout 5 ralph run -c ralph.claude.yml -p "Hi" & sleep 1 kill -INT $! wait done echo "Terminal should be normal" ``` ## Related Issues - `tasks/fix-termination-cleanup.code-task.md` (completed) - Terminal state corruption on abort - `tasks/fix-ctrl-c-freeze.code-task.md` (completed) - TUI freeze after double Ctrl+C ## Labels `bug`, `signal-handling`, `pty`, `tui`
kerem closed this issue 2026-02-27 10:21:51 +03:00
Author
Owner

@mikeyobrien commented on GitHub (Jan 20, 2026):

Fixed in v2.1.0! 🎉

The TUI refactor addressed the interrupt handling race condition by restructuring how iterations are managed and cleaned up.

# Update to v2.1.0
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/mikeyobrien/ralph-orchestrator/releases/download/v2.1.0/ralph-installer.sh | sh
<!-- gh-comment-id:3771038372 --> @mikeyobrien commented on GitHub (Jan 20, 2026): Fixed in v2.1.0\! 🎉 The TUI refactor addressed the interrupt handling race condition by restructuring how iterations are managed and cleaned up. ```bash # Update to v2.1.0 curl --proto '=https' --tlsv1.2 -LsSf https://github.com/mikeyobrien/ralph-orchestrator/releases/download/v2.1.0/ralph-installer.sh | sh ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ralph-orchestrator#28
No description provided.