[GH-ISSUE #19] Respect "retry-after-ms" or "retryDelay" when 429 or 503 response given

kerem commented

2026-02-27 15:37:57 +03:00

Owner

Originally created by @hwang-pixel on GitHub (Feb 19, 2026).
Original GitHub issue: https://github.com/NikkeTryHard/zerogravity/issues/19

What do you want?

We should retry another request with given time from server response (respect it) so it doesnt just incremental backoff.

Use Case

Everytime google response 503 or 429, it will have time to respect given by them.

Example

{
  "error": {
    "code": 429,
    "message": "You have exhausted your gemini-3-pro-high quota. Please try again later.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "QUOTA_EXHAUSTED",
        "domain": "aicode.googleapis.com",
        "metadata": {
          "uiMessage": "true",
          "quotaResetDelay": "2.463586755s",
          "quotaResetTimeStamp": "2026-02-12T08:18:19Z",
          "model": "gemini-3-pro-high"
        }
      },
      {
        "@type": "type.googleapis.com/google.rpc.LocalizedMessage",
        "locale": "en-US",
        "message": "You have exhausted your gemini-3-pro-high quota. Please try again later."
      },
      {
        "@type": "type.googleapis.com/google.rpc.RetryInfo",
        "retryDelay": "2.463586755s"
      }
    ]
  }
}

In example above, it shows only 2s to wait before trying another request. But sometimes you have 42s, 58s, etc etc. So incremental backoff sometime wont respect time given by google response

Please also note that, time given by google can be 5h20m123123.31231s. So you have to craft a logic to ignore if X < time retry given by google response, by X is configurable number by user env (not magic number or hardcode).

Here example regex to catch time from google format

// Try to extract "retry-after-ms" or "retryDelay" - check seconds format first (e.g. "7739.23s")
const secMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+([\d.]+)(?:s\b|s")/i);
             if (secMatch) {
                 resetMs = Math.ceil(parseFloat(secMatch[1]) * 1000);
                 logger.debug(`[CloudCode] Parsed retry seconds from body (precise): ${resetMs}ms`);
             }

Alternatives Considered

No response

Originally created by @hwang-pixel on GitHub (Feb 19, 2026). Original GitHub issue: https://github.com/NikkeTryHard/zerogravity/issues/19 ### What do you want? We should retry another request with given time from server response (respect it) so it doesnt just incremental backoff. ### Use Case Everytime google response 503 or 429, it will have time to respect given by them. Example ``` { "error": { "code": 429, "message": "You have exhausted your gemini-3-pro-high quota. Please try again later.", "status": "RESOURCE_EXHAUSTED", "details": [ { "@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "QUOTA_EXHAUSTED", "domain": "aicode.googleapis.com", "metadata": { "uiMessage": "true", "quotaResetDelay": "2.463586755s", "quotaResetTimeStamp": "2026-02-12T08:18:19Z", "model": "gemini-3-pro-high" } }, { "@type": "type.googleapis.com/google.rpc.LocalizedMessage", "locale": "en-US", "message": "You have exhausted your gemini-3-pro-high quota. Please try again later." }, { "@type": "type.googleapis.com/google.rpc.RetryInfo", "retryDelay": "2.463586755s" } ] } } ``` In example above, it shows only 2s to wait before trying another request. But sometimes you have 42s, 58s, etc etc. So incremental backoff sometime wont respect time given by google response Please also note that, time given by google can be 5h20m123123.31231s. So you have to craft a logic to ignore if X < time retry given by google response, by X is configurable number by user env (not magic number or hardcode). Here example regex to catch time from google format ```javascript // Try to extract "retry-after-ms" or "retryDelay" - check seconds format first (e.g. "7739.23s") const secMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+([\d.]+)(?:s\b|s")/i); if (secMatch) { resetMs = Math.ceil(parseFloat(secMatch[1]) * 1000); logger.debug(`[CloudCode] Parsed retry seconds from body (precise): ${resetMs}ms`); } ``` ### Alternatives Considered _No response_

kerem

2026-02-27 15:37:57 +03:00

closed this issue
added the
enhancement
label

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

good point. i added time back off since it consumed all of my quota.

@NikkeTryHard commented on GitHub (Feb 19, 2026): good point. i added time back off since it consumed all of my quota.

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

are you on v1.1.4? just to be sure

@NikkeTryHard commented on GitHub (Feb 19, 2026): are you on v1.1.4? just to be sure

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

yup im on 1.1.4

@hwang-pixel commented on GitHub (Feb 19, 2026): yup im on 1.1.4

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

i will just drop my rate-limit-parser.js so you can implement this on rust easily. you just need to convert it

export function parseResetTime(responseOrError, errorText = '') {
    let resetMs = null;

    // If it's a Response object, check headers first
    if (responseOrError && typeof responseOrError.headers?.get === 'function') {
        const headers = responseOrError.headers;

        // Standard Retry-After header (seconds or HTTP date)
        const retryAfter = headers.get('retry-after');
        if (retryAfter) {
            const seconds = parseInt(retryAfter, 10);
            if (!isNaN(seconds)) {
                resetMs = seconds * 1000;
                logger.debug(`[CloudCode] Retry-After header: ${seconds}s`);
            } else {
                // Try parsing as HTTP date
                const date = new Date(retryAfter);
                if (!isNaN(date.getTime())) {
                    resetMs = date.getTime() - Date.now();
                    if (resetMs > 0) {
                        logger.debug(`[CloudCode] Retry-After date: ${retryAfter}`);
                    } else {
                        resetMs = null;
                    }
                }
            }
        }

        // x-ratelimit-reset (Unix timestamp in seconds)
        if (!resetMs) {
            const ratelimitReset = headers.get('x-ratelimit-reset');
            if (ratelimitReset) {
                const resetTimestamp = parseInt(ratelimitReset, 10) * 1000;
                resetMs = resetTimestamp - Date.now();
                if (resetMs > 0) {
                    logger.debug(`[CloudCode] x-ratelimit-reset: ${new Date(resetTimestamp).toISOString()}`);
                } else {
                    resetMs = null;
                }
            }
        }

        // x-ratelimit-reset-after (seconds)
        if (!resetMs) {
            const resetAfter = headers.get('x-ratelimit-reset-after');
            if (resetAfter) {
                const seconds = parseInt(resetAfter, 10);
                if (!isNaN(seconds) && seconds > 0) {
                    resetMs = seconds * 1000;
                    logger.debug(`[CloudCode] x-ratelimit-reset-after: ${seconds}s`);
                }
            }
        }
    }

    // If no header found, try parsing from error message/body
    if (!resetMs) {
        const msg = (responseOrError instanceof Error ? responseOrError.message : errorText) || '';

        // Try to extract "quotaResetDelay" first
        // Supports: "754.431528ms", "1.5s", "2h57m16.903944243s", "10636.903944243s"
        const quotaDurationMatch = msg.match(/quotaResetDelay[:"\s]+(?:(\d+)h)?(?:(\d+)m)?(\d+(?:\.\d+)?)(ms|s)/i);
        if (quotaDurationMatch) {
            const hours = quotaDurationMatch[1] ? parseInt(quotaDurationMatch[1], 10) : 0;
            const minutes = quotaDurationMatch[2] ? parseInt(quotaDurationMatch[2], 10) : 0;
            const secValue = parseFloat(quotaDurationMatch[3]);
            const unit = quotaDurationMatch[4].toLowerCase();
            if (unit === 'ms') {
                resetMs = Math.ceil(secValue);
            } else {
                resetMs = Math.ceil((hours * 3600 + minutes * 60 + secValue) * 1000);
            }
            logger.debug(`[CloudCode] Parsed quotaResetDelay from body: ${resetMs}ms`);
        }

        // Try to extract "quotaResetTimeStamp" (ISO format like "2025-12-31T07:00:47Z")
        if (!resetMs) {
            const quotaTimestampMatch = msg.match(/quotaResetTimeStamp[:\s"]+(\d{4}-\d{2}-\d{2}T[\d:.]+Z?)/i);
            if (quotaTimestampMatch) {
                const resetTime = new Date(quotaTimestampMatch[1]).getTime();
                if (!isNaN(resetTime)) {
                    resetMs = resetTime - Date.now();
                    // Even if expired or 0, we found a timestamp, so rely on it.
                    // But if it's negative, it means "now", so treat as small wait.
                    logger.debug(`[CloudCode] Parsed quotaResetTimeStamp: ${quotaTimestampMatch[1]} (Delta: ${resetMs}ms)`);
                }
            }
        }

        // Try to extract "retry-after-ms" or "retryDelay" - check seconds format first (e.g. "7739.23s")
        // Added stricter regex to avoid partial matches
        if (!resetMs) {
             const secMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+([\d.]+)(?:s\b|s")/i);
             if (secMatch) {
                 resetMs = Math.ceil(parseFloat(secMatch[1]) * 1000);
                 logger.debug(`[CloudCode] Parsed retry seconds from body (precise): ${resetMs}ms`);
             }
        }

        if (!resetMs) {
            // Check for ms (explicit "ms" suffix or implicit if no suffix)
            const msMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+(\d+)(?:\s*ms)?(?![\w.])/i);
            if (msMatch) {
                resetMs = parseInt(msMatch[1], 10);
                logger.debug(`[CloudCode] Parsed retry-after-ms from body: ${resetMs}ms`);
            }
        }

        // Try to extract seconds value like "retry after 60 seconds"
        if (!resetMs) {
            const secMatch = msg.match(/retry\s+(?:after\s+)?(\d+)\s*(?:sec|s\b)/i);
            if (secMatch) {
                resetMs = parseInt(secMatch[1], 10) * 1000;
                logger.debug(`[CloudCode] Parsed retry seconds from body: ${secMatch[1]}s`);
            }
        }

        // Try to extract duration like "1h23m45.5s" or "23m45s" or "45.123s"
        // Supports float seconds with nanosecond precision (e.g. "2h57m16.903944243s")
        if (!resetMs) {
            const durationMatch = msg.match(/(\d+)h(\d+)m(\d+(?:\.\d+)?)s|(\d+)m(\d+(?:\.\d+)?)s|(\d+(?:\.\d+)?)s/i);
            if (durationMatch) {
                if (durationMatch[1]) {
                    const hours = parseInt(durationMatch[1], 10);
                    const minutes = parseInt(durationMatch[2], 10);
                    const seconds = parseFloat(durationMatch[3]);
                    resetMs = Math.ceil((hours * 3600 + minutes * 60 + seconds) * 1000);
                } else if (durationMatch[4]) {
                    const minutes = parseInt(durationMatch[4], 10);
                    const seconds = parseFloat(durationMatch[5]);
                    resetMs = Math.ceil((minutes * 60 + seconds) * 1000);
                } else if (durationMatch[6]) {
                    resetMs = Math.ceil(parseFloat(durationMatch[6]) * 1000);
                }
                if (resetMs) {
                    logger.debug(`[CloudCode] Parsed duration from body: ${formatDuration(resetMs)}`);
                }
            }
        }

        // Try to extract ISO timestamp or Unix timestamp
        if (!resetMs) {
            const isoMatch = msg.match(/reset[:\s"]+(\d{4}-\d{2}-\d{2}T[\d:.]+Z?)/i);
            if (isoMatch) {
                const resetTime = new Date(isoMatch[1]).getTime();
                if (!isNaN(resetTime)) {
                    resetMs = resetTime - Date.now();
                    if (resetMs > 0) {
                        logger.debug(`[CloudCode] Parsed ISO reset time: ${isoMatch[1]}`);
                    } else {
                        resetMs = null;
                    }
                }
            }
        }
    }

    // SANITY CHECK: Handle very small or negative reset times
    // For sub-second rate limits (common with per-second quotas), add a small buffer
    // For negative or zero, use a reasonable minimum
    if (resetMs !== null) {
        if (resetMs <= 0) {
            logger.debug(`[CloudCode] Reset time invalid (${resetMs}ms), using 500ms default`);
            resetMs = 500;
        } else if (resetMs < 500) {
            // Very short reset - add 200ms buffer for network latency
            logger.debug(`[CloudCode] Short reset time (${resetMs}ms), adding 200ms buffer`);
            resetMs = resetMs + 200;
        }
        // Note: No longer enforcing 2s minimum - this was causing cascading failures
        // when all accounts had short rate limits simultaneously
    }

    return resetMs;
}

/**
 * Parse the rate limit reason from error text
 * Used for smart backoff by error type (matches opencode-antigravity-auth)
 *
 * @param {string} errorText - Error message/body text
 * @param {number} [status] - HTTP status code (optional, for status-based classification)
 * @returns {'RATE_LIMIT_EXCEEDED' | 'QUOTA_EXHAUSTED' | 'MODEL_CAPACITY_EXHAUSTED' | 'SERVER_ERROR' | 'UNKNOWN'} Error reason
 */
export function parseRateLimitReason(errorText, status) {
    // Status code checks FIRST (matches opencode-antigravity-auth Rust parity)
    // 529 = Site Overloaded, 503 = Service Unavailable → Capacity issues
    if (status === 529 || status === 503) return 'MODEL_CAPACITY_EXHAUSTED';
    // 500 = Internal Server Error → Treat as Server Error (soft wait)
    if (status === 500) return 'SERVER_ERROR';

    const lower = (errorText || '').toLowerCase();

    // Check for quota exhaustion (daily/hourly limits)
    if (lower.includes('quota_exhausted') ||
        lower.includes('quotaresetdelay') ||
        lower.includes('quotaresettimestamp') ||
        lower.includes('resource_exhausted') ||
        lower.includes('daily limit') ||
        lower.includes('quota exceeded')) {
        return 'QUOTA_EXHAUSTED';
    }

    // Check for model capacity issues (temporary, retry quickly)
    if (lower.includes('model_capacity_exhausted') ||
        lower.includes('capacity_exhausted') ||
        lower.includes('model is currently overloaded') ||
        lower.includes('service temporarily unavailable')) {
        return 'MODEL_CAPACITY_EXHAUSTED';
    }

    // Check for rate limiting (per-minute limits)
    if (lower.includes('rate_limit_exceeded') ||
        lower.includes('rate limit') ||
        lower.includes('too many requests') ||
        lower.includes('throttl')) {
        return 'RATE_LIMIT_EXCEEDED';
    }

    // Check for server errors
    if (lower.includes('internal server error') ||
        lower.includes('server error') ||
        lower.includes('503') ||
        lower.includes('502') ||
        lower.includes('504')) {
        return 'SERVER_ERROR';
    }

    return 'UNKNOWN';
}

@hwang-pixel commented on GitHub (Feb 19, 2026): i will just drop my rate-limit-parser.js so you can implement this on rust easily. you just need to convert it ```javascript export function parseResetTime(responseOrError, errorText = '') { let resetMs = null; // If it's a Response object, check headers first if (responseOrError && typeof responseOrError.headers?.get === 'function') { const headers = responseOrError.headers; // Standard Retry-After header (seconds or HTTP date) const retryAfter = headers.get('retry-after'); if (retryAfter) { const seconds = parseInt(retryAfter, 10); if (!isNaN(seconds)) { resetMs = seconds * 1000; logger.debug(`[CloudCode] Retry-After header: ${seconds}s`); } else { // Try parsing as HTTP date const date = new Date(retryAfter); if (!isNaN(date.getTime())) { resetMs = date.getTime() - Date.now(); if (resetMs > 0) { logger.debug(`[CloudCode] Retry-After date: ${retryAfter}`); } else { resetMs = null; } } } } // x-ratelimit-reset (Unix timestamp in seconds) if (!resetMs) { const ratelimitReset = headers.get('x-ratelimit-reset'); if (ratelimitReset) { const resetTimestamp = parseInt(ratelimitReset, 10) * 1000; resetMs = resetTimestamp - Date.now(); if (resetMs > 0) { logger.debug(`[CloudCode] x-ratelimit-reset: ${new Date(resetTimestamp).toISOString()}`); } else { resetMs = null; } } } // x-ratelimit-reset-after (seconds) if (!resetMs) { const resetAfter = headers.get('x-ratelimit-reset-after'); if (resetAfter) { const seconds = parseInt(resetAfter, 10); if (!isNaN(seconds) && seconds > 0) { resetMs = seconds * 1000; logger.debug(`[CloudCode] x-ratelimit-reset-after: ${seconds}s`); } } } } // If no header found, try parsing from error message/body if (!resetMs) { const msg = (responseOrError instanceof Error ? responseOrError.message : errorText) || ''; // Try to extract "quotaResetDelay" first // Supports: "754.431528ms", "1.5s", "2h57m16.903944243s", "10636.903944243s" const quotaDurationMatch = msg.match(/quotaResetDelay[:"\s]+(?:(\d+)h)?(?:(\d+)m)?(\d+(?:\.\d+)?)(ms|s)/i); if (quotaDurationMatch) { const hours = quotaDurationMatch[1] ? parseInt(quotaDurationMatch[1], 10) : 0; const minutes = quotaDurationMatch[2] ? parseInt(quotaDurationMatch[2], 10) : 0; const secValue = parseFloat(quotaDurationMatch[3]); const unit = quotaDurationMatch[4].toLowerCase(); if (unit === 'ms') { resetMs = Math.ceil(secValue); } else { resetMs = Math.ceil((hours * 3600 + minutes * 60 + secValue) * 1000); } logger.debug(`[CloudCode] Parsed quotaResetDelay from body: ${resetMs}ms`); } // Try to extract "quotaResetTimeStamp" (ISO format like "2025-12-31T07:00:47Z") if (!resetMs) { const quotaTimestampMatch = msg.match(/quotaResetTimeStamp[:\s"]+(\d{4}-\d{2}-\d{2}T[\d:.]+Z?)/i); if (quotaTimestampMatch) { const resetTime = new Date(quotaTimestampMatch[1]).getTime(); if (!isNaN(resetTime)) { resetMs = resetTime - Date.now(); // Even if expired or 0, we found a timestamp, so rely on it. // But if it's negative, it means "now", so treat as small wait. logger.debug(`[CloudCode] Parsed quotaResetTimeStamp: ${quotaTimestampMatch[1]} (Delta: ${resetMs}ms)`); } } } // Try to extract "retry-after-ms" or "retryDelay" - check seconds format first (e.g. "7739.23s") // Added stricter regex to avoid partial matches if (!resetMs) { const secMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+([\d.]+)(?:s\b|s")/i); if (secMatch) { resetMs = Math.ceil(parseFloat(secMatch[1]) * 1000); logger.debug(`[CloudCode] Parsed retry seconds from body (precise): ${resetMs}ms`); } } if (!resetMs) { // Check for ms (explicit "ms" suffix or implicit if no suffix) const msMatch = msg.match(/(?:retry[-_]?after[-_]?ms|retryDelay)[:\s"]+(\d+)(?:\s*ms)?(?![\w.])/i); if (msMatch) { resetMs = parseInt(msMatch[1], 10); logger.debug(`[CloudCode] Parsed retry-after-ms from body: ${resetMs}ms`); } } // Try to extract seconds value like "retry after 60 seconds" if (!resetMs) { const secMatch = msg.match(/retry\s+(?:after\s+)?(\d+)\s*(?:sec|s\b)/i); if (secMatch) { resetMs = parseInt(secMatch[1], 10) * 1000; logger.debug(`[CloudCode] Parsed retry seconds from body: ${secMatch[1]}s`); } } // Try to extract duration like "1h23m45.5s" or "23m45s" or "45.123s" // Supports float seconds with nanosecond precision (e.g. "2h57m16.903944243s") if (!resetMs) { const durationMatch = msg.match(/(\d+)h(\d+)m(\d+(?:\.\d+)?)s|(\d+)m(\d+(?:\.\d+)?)s|(\d+(?:\.\d+)?)s/i); if (durationMatch) { if (durationMatch[1]) { const hours = parseInt(durationMatch[1], 10); const minutes = parseInt(durationMatch[2], 10); const seconds = parseFloat(durationMatch[3]); resetMs = Math.ceil((hours * 3600 + minutes * 60 + seconds) * 1000); } else if (durationMatch[4]) { const minutes = parseInt(durationMatch[4], 10); const seconds = parseFloat(durationMatch[5]); resetMs = Math.ceil((minutes * 60 + seconds) * 1000); } else if (durationMatch[6]) { resetMs = Math.ceil(parseFloat(durationMatch[6]) * 1000); } if (resetMs) { logger.debug(`[CloudCode] Parsed duration from body: ${formatDuration(resetMs)}`); } } } // Try to extract ISO timestamp or Unix timestamp if (!resetMs) { const isoMatch = msg.match(/reset[:\s"]+(\d{4}-\d{2}-\d{2}T[\d:.]+Z?)/i); if (isoMatch) { const resetTime = new Date(isoMatch[1]).getTime(); if (!isNaN(resetTime)) { resetMs = resetTime - Date.now(); if (resetMs > 0) { logger.debug(`[CloudCode] Parsed ISO reset time: ${isoMatch[1]}`); } else { resetMs = null; } } } } } // SANITY CHECK: Handle very small or negative reset times // For sub-second rate limits (common with per-second quotas), add a small buffer // For negative or zero, use a reasonable minimum if (resetMs !== null) { if (resetMs <= 0) { logger.debug(`[CloudCode] Reset time invalid (${resetMs}ms), using 500ms default`); resetMs = 500; } else if (resetMs < 500) { // Very short reset - add 200ms buffer for network latency logger.debug(`[CloudCode] Short reset time (${resetMs}ms), adding 200ms buffer`); resetMs = resetMs + 200; } // Note: No longer enforcing 2s minimum - this was causing cascading failures // when all accounts had short rate limits simultaneously } return resetMs; } /** * Parse the rate limit reason from error text * Used for smart backoff by error type (matches opencode-antigravity-auth) * * @param {string} errorText - Error message/body text * @param {number} [status] - HTTP status code (optional, for status-based classification) * @returns {'RATE_LIMIT_EXCEEDED' | 'QUOTA_EXHAUSTED' | 'MODEL_CAPACITY_EXHAUSTED' | 'SERVER_ERROR' | 'UNKNOWN'} Error reason */ export function parseRateLimitReason(errorText, status) { // Status code checks FIRST (matches opencode-antigravity-auth Rust parity) // 529 = Site Overloaded, 503 = Service Unavailable → Capacity issues if (status === 529 || status === 503) return 'MODEL_CAPACITY_EXHAUSTED'; // 500 = Internal Server Error → Treat as Server Error (soft wait) if (status === 500) return 'SERVER_ERROR'; const lower = (errorText || '').toLowerCase(); // Check for quota exhaustion (daily/hourly limits) if (lower.includes('quota_exhausted') || lower.includes('quotaresetdelay') || lower.includes('quotaresettimestamp') || lower.includes('resource_exhausted') || lower.includes('daily limit') || lower.includes('quota exceeded')) { return 'QUOTA_EXHAUSTED'; } // Check for model capacity issues (temporary, retry quickly) if (lower.includes('model_capacity_exhausted') || lower.includes('capacity_exhausted') || lower.includes('model is currently overloaded') || lower.includes('service temporarily unavailable')) { return 'MODEL_CAPACITY_EXHAUSTED'; } // Check for rate limiting (per-minute limits) if (lower.includes('rate_limit_exceeded') || lower.includes('rate limit') || lower.includes('too many requests') || lower.includes('throttl')) { return 'RATE_LIMIT_EXCEEDED'; } // Check for server errors if (lower.includes('internal server error') || lower.includes('server error') || lower.includes('503') || lower.includes('502') || lower.includes('504')) { return 'SERVER_ERROR'; } return 'UNKNOWN'; } ```

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

feel free to use & implement. no copyright, i vibe it using opus 4.6. notice [cloudcode], it is log info from other project proxy if you know lol. but that rate-limit-parse.js is 100% my own customization , you can search to make sure i am not giving you others code (its 100% mine)

@hwang-pixel commented on GitHub (Feb 19, 2026): feel free to use & implement. no copyright, i vibe it using opus 4.6. notice `[cloudcode]`, it is log info from other project proxy if you know lol. but that rate-limit-parse.js is 100% my own customization , you can search to make sure i am not giving you others code (its 100% mine)

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

Cant thank you enough. im vibe coding too rn lol. reviewing as much as i can

@NikkeTryHard commented on GitHub (Feb 19, 2026): Cant thank you enough. im vibe coding too rn lol. reviewing as much as i can

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

v1.1.5-beta.2 — Server-Specified Retry Delays

This is now implemented. The proxy will:

Parse retryDelay from google.rpc.RetryInfo (e.g. "2.463586755s")
Parse quotaResetDelay from google.rpc.ErrorInfo metadata (e.g. "2h57m16.9s")
Parse quotaResetTimeStamp from metadata (ISO 8601 → delta from now)
Use server-specified delay instead of blind backoff when available
Reject retries when delay exceeds configurable max (default 120s) — prevents quota burn
Sanity floor for sub-500ms delays (+200ms buffer to avoid retry storms)

Configuration

# Max delay before giving up (default: 120s)
# If Google says "wait 5h20m", proxy returns error instead of waiting
export ZEROGRAVITY_MAX_RETRY_DELAY=120

How it works

Google returns 429 with retryDelay: "2.46s"
  → Proxy waits ~2.7s (2.46s + jitter) then retries on new cascade ✅

Google returns 429 with quotaResetDelay: "2h57m16s"  
  → Delay (10636s) > max cap (120s) → immediate error, no retry ✅

Google returns 429 with no parseable delay
  → Falls back to random 2-5s delay (existing behavior) ✅

Credit to @hwang-pixel for the quotaResetTimeStamp parsing idea and sanity floor — adapted from your JS parser 🤝

Commit: a23cd72

@NikkeTryHard commented on GitHub (Feb 19, 2026): ## v1.1.5-beta.2 — Server-Specified Retry Delays This is now implemented. The proxy will: 1. **Parse `retryDelay`** from `google.rpc.RetryInfo` (e.g. `"2.463586755s"`) 2. **Parse `quotaResetDelay`** from `google.rpc.ErrorInfo` metadata (e.g. `"2h57m16.9s"`) 3. **Parse `quotaResetTimeStamp`** from metadata (ISO 8601 → delta from now) 4. **Use server-specified delay** instead of blind backoff when available 5. **Reject retries** when delay exceeds configurable max (default 120s) — prevents quota burn 6. **Sanity floor** for sub-500ms delays (+200ms buffer to avoid retry storms) ### Configuration ```bash # Max delay before giving up (default: 120s) # If Google says "wait 5h20m", proxy returns error instead of waiting export ZEROGRAVITY_MAX_RETRY_DELAY=120 ``` ### How it works ``` Google returns 429 with retryDelay: "2.46s" → Proxy waits ~2.7s (2.46s + jitter) then retries on new cascade ✅ Google returns 429 with quotaResetDelay: "2h57m16s" → Delay (10636s) > max cap (120s) → immediate error, no retry ✅ Google returns 429 with no parseable delay → Falls back to random 2-5s delay (existing behavior) ✅ ``` Credit to @hwang-pixel for the `quotaResetTimeStamp` parsing idea and sanity floor — adapted from your JS parser 🤝 Commit: a23cd72

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

wow so fast, thank you so much to make this happen.

@hwang-pixel commented on GitHub (Feb 19, 2026): wow so fast, thank you so much to make this happen.

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

v1.1.5-beta.2 is out with this fix. Please test and report back!

Binary:

curl -fsSL https://github.com/NikkeTryHard/zerogravity/releases/download/v1.1.5-beta.2/zerogravity-linux-x86_64 -o zerogravity
chmod +x zerogravity

Docker:

docker pull ghcr.io/nikketryhard/zerogravity:1.1.5-beta.2

New env var ZEROGRAVITY_MAX_RETRY_DELAY (default 120s) — delays beyond this cap will return the error immediately instead of waiting.

@NikkeTryHard commented on GitHub (Feb 19, 2026): v1.1.5-beta.2 is out with this fix. Please test and report back! **Binary:** ```bash curl -fsSL https://github.com/NikkeTryHard/zerogravity/releases/download/v1.1.5-beta.2/zerogravity-linux-x86_64 -o zerogravity chmod +x zerogravity ``` **Docker:** ```bash docker pull ghcr.io/nikketryhard/zerogravity:1.1.5-beta.2 ``` New env var `ZEROGRAVITY_MAX_RETRY_DELAY` (default 120s) — delays beyond this cap will return the error immediately instead of waiting.

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

hey, did you just commit the entire source?

@hwang-pixel commented on GitHub (Feb 19, 2026): hey, did you just commit the entire source?

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

because i try to look it up with incognito, and it shows src

@hwang-pixel commented on GitHub (Feb 19, 2026): because i try to look it up with incognito, and it shows src

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@hwang-pixel commented on GitHub (Feb 19, 2026):

brother be careful @NikkeTryHard

@hwang-pixel commented on GitHub (Feb 19, 2026): brother be careful @NikkeTryHard

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

bro yeah it was so dangerous

@NikkeTryHard commented on GitHub (Feb 19, 2026): bro yeah it was so dangerous

kerem commented

2026-02-27 15:37:58 +03:00

Author

Owner

@NikkeTryHard commented on GitHub (Feb 19, 2026):

added github actions for PR and Pushes so it will hopefulyl never happen again

@NikkeTryHard commented on GitHub (Feb 19, 2026): added github actions for PR and Pushes so it will hopefulyl never happen again

Rows
Columns

[GH-ISSUE #19] Respect "retry-after-ms" or "retryDelay" when 429 or 503 response given #17

What do you want?

Use Case

Alternatives Considered

v1.1.5-beta.2 — Server-Specified Retry Delays

Configuration

How it works