[GH-ISSUE #54] BUG: [Regression #38] CLOSE_WAIT leak triggered by VPN/network interface change #35

Closed
opened 2026-02-27 07:17:35 +03:00 by kerem · 2 comments
Owner

Originally created by @bhaskoro-muthohar on GitHub (Jan 23, 2026).
Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/54

Kiro Gateway Version

latest (includes fix from #38)

What happened?

Gateway accumulated 100 CLOSE_WAIT connections after VPN disconnect. The fix from #38 (Connection: close header) works correctly under stable network conditions, but fails when the network interface changes (e.g., VPN connect/disconnect).

Root Cause

When VPN disconnects, connections made through the VPN interface become orphaned:

  1. Gateway starts with VPN connected, makes connections via 172.16.0.2
  2. VPN disconnects, interface disappears
  3. Those TCP sockets enter CLOSE_WAIT but can never be closed (no route)
  4. Orphaned connections accumulate indefinitely

Evidence

# 100 CLOSE_WAIT connections, from two different source IPs:
$ lsof -p <pid> | grep "CLOSE_WAIT" | awk '{print $9}' | cut -d: -f1 | sort | uniq -c
  11 172.16.0.2      # VPN interface (no longer exists)
  89 192.168.18.176  # Current interface

# VPN interface is gone:
$ netstat -rn | grep "172.16"
(empty)

Verification

After restarting the gateway (without VPN), the fix works correctly:

# Fresh gateway: 0 CLOSE_WAIT
# After 3 streaming requests: 0 CLOSE_WAIT
# After 10 concurrent streaming requests: 0 CLOSE_WAIT

Reproduction

  1. Start gateway with VPN connected
  2. Make several streaming requests
  3. Disconnect VPN
  4. Check: lsof -p <pid> | grep -c "CLOSE_WAIT"
  5. Observe orphaned connections from old VPN interface

Proposed Fix

Option 1: Don't use shared client for streaming requests
Option 2: Document workaround - restart gateway after VPN changes

Debug Logs

N/A - gradual resource leak, not immediate error.

Regression of #38 - original fix works under stable network, fails on interface change

Originally created by @bhaskoro-muthohar on GitHub (Jan 23, 2026). Original GitHub issue: https://github.com/jwadow/kiro-gateway/issues/54 ## Kiro Gateway Version latest (includes fix from #38) ## What happened? Gateway accumulated 100 CLOSE_WAIT connections after VPN disconnect. The fix from #38 (`Connection: close` header) works correctly under stable network conditions, but fails when the network interface changes (e.g., VPN connect/disconnect). ### Root Cause When VPN disconnects, connections made through the VPN interface become orphaned: 1. Gateway starts with VPN connected, makes connections via 172.16.0.2 2. VPN disconnects, interface disappears 3. Those TCP sockets enter CLOSE_WAIT but can never be closed (no route) 4. Orphaned connections accumulate indefinitely ### Evidence ```bash # 100 CLOSE_WAIT connections, from two different source IPs: $ lsof -p <pid> | grep "CLOSE_WAIT" | awk '{print $9}' | cut -d: -f1 | sort | uniq -c 11 172.16.0.2 # VPN interface (no longer exists) 89 192.168.18.176 # Current interface # VPN interface is gone: $ netstat -rn | grep "172.16" (empty) ``` ### Verification After restarting the gateway (without VPN), the fix works correctly: ```bash # Fresh gateway: 0 CLOSE_WAIT # After 3 streaming requests: 0 CLOSE_WAIT # After 10 concurrent streaming requests: 0 CLOSE_WAIT ``` ### Reproduction 1. Start gateway with VPN connected 2. Make several streaming requests 3. Disconnect VPN 4. Check: `lsof -p <pid> | grep -c "CLOSE_WAIT"` 5. Observe orphaned connections from old VPN interface ## Proposed Fix Option 1: Don't use shared client for streaming requests Option 2: Document workaround - restart gateway after VPN changes ## Debug Logs N/A - gradual resource leak, not immediate error. ## Related Regression of #38 - original fix works under stable network, fails on interface change
kerem 2026-02-27 07:17:35 +03:00
  • closed this issue
  • added the
    bug
    fixed
    labels
Author
Owner

@jwadow commented on GitHub (Jan 23, 2026):

Checked your logs. Found the root cause - it's not a regression of #38, it's a fundamental issue with connection pooling when network interfaces change.

Here's what happens: VPN disconnects -> sockets from old interface (172.16.0.2) get stuck in CLOSE_WAIT -> can't be closed because no route -> accumulate forever in the pool.

The Connection: close header from #38 tells the server not to reuse connections, but doesn't help with already-established connections that become orphaned.

Fixed it by using per-request clients for streaming. Makes sense because streaming requests are long-lived and don't benefit from pooling anyway. Non-streaming still uses shared pool.

Can you test? Should see CLOSE_WAIT stay at 0-1 after VPN changes instead of climbing to 100+

<!-- gh-comment-id:3789765582 --> @jwadow commented on GitHub (Jan 23, 2026): Checked your logs. Found the root cause - it's not a regression of #38, it's a fundamental issue with connection pooling when network interfaces change. Here's what happens: VPN disconnects -> sockets from old interface (172.16.0.2) get stuck in CLOSE_WAIT -> can't be closed because no route -> accumulate forever in the pool. The `Connection: close` header from #38 tells the server not to reuse connections, but doesn't help with already-established connections that become orphaned. Fixed it by using per-request clients for streaming. Makes sense because streaming requests are long-lived and don't benefit from pooling anyway. Non-streaming still uses shared pool. Can you test? Should see CLOSE_WAIT stay at 0-1 after VPN changes instead of climbing to 100+
Author
Owner

@bhaskoro-muthohar commented on GitHub (Jan 24, 2026):

Tested and confirmed fix works! 🎉

Test results:

Scenario CLOSE_WAIT
Initial 0
After 10 requests (no VPN) 0
After 10 requests (VPN on - 172.16.0.2) 0
After VPN disconnect 0
After 5 more requests 0
Orphaned 172.16.0.2 connections 0

Previously this would climb to 100+ after VPN disconnect. Per-request client for streaming fixes it completely.

Thanks for the quick fix!

<!-- gh-comment-id:3793981482 --> @bhaskoro-muthohar commented on GitHub (Jan 24, 2026): Tested and confirmed fix works! 🎉 **Test results:** | Scenario | CLOSE_WAIT | |----------|------------| | Initial | 0 | | After 10 requests (no VPN) | 0 | | After 10 requests (VPN on - 172.16.0.2) | 0 | | After VPN disconnect | 0 | | After 5 more requests | 0 | | Orphaned 172.16.0.2 connections | 0 | Previously this would climb to 100+ after VPN disconnect. Per-request client for streaming fixes it completely. Thanks for the quick fix!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/kiro-gateway-jwadow#35
No description provided.