[GH-ISSUE #1015] [BUG] Worker failure appears to increment retry counter #2510

Open
opened 2026-03-15 20:44:51 +03:00 by kerem · 3 comments
Owner

Originally created by @kmcgovern-apixio on GitHub (Jan 29, 2025).
Original GitHub issue: https://github.com/hibiken/asynq/issues/1015

Originally assigned to: @hibiken, @kamikazechaser on GitHub.

Describe the bug
When a worker fails a task will move into archived state without retrying if asynq.MaxRetries(0) is set on the task. Setting it to any positive value causes allows X worker failures. IE setting asynq.MaxRetries(3) will cause allow the task to get picked up again up to 3 times with worker failures as the cause

Environment (please complete the following information):

  • OS: linux
  • asynq package version: 0.25.1
  • Redis version: 7.4.2

To Reproduce
Steps to reproduce the behavior (Code snippets if applicable):

  1. create task with asynq.MaxRetries(0)
  2. run task
  3. kill worker before tasks completes (IE ctrl + c)
  4. check task state in redis hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96"
  5. start worker backup
  6. after lease expires, task will move to archived state (can check with hgetall)

Expected behavior
tasks are picked back up on worker failure. retries is incremented only upon a task returning an error or panic

Screenshots
output from redis cli (payloads and job name redacted)

127.0.0.1:6379> hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96"
1) "state"
2) "active"
3) "msg"
4) "\n\x1finternal:myjobname\x12\xc2\x02taskpayloadhere\x1a$8abc3be7-2f6f-4ac4-a610-1c0a23188d96\"\adefault@\x88\x0e`\x80\xa3\x05"
127.0.0.1:6379> hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96"
1) "state"
2) "archived"
3) "msg"
4) "\n\x1finternal:myjobname\x12\xc2\x02taskpayloadhere\x1a$8abc3be7-2f6f-4ac4-a610-1c0a23188d96\"\adefault:\x19asynq: task lease expired@\x88\x0eX\xa6\xe7\xe6\xbc\x06`\x80\xa3\x05"

Additional context
For now i am bumping the max retry value to be able to handle this, but I was expecting worker failures to not impact retries

Originally created by @kmcgovern-apixio on GitHub (Jan 29, 2025). Original GitHub issue: https://github.com/hibiken/asynq/issues/1015 Originally assigned to: @hibiken, @kamikazechaser on GitHub. **Describe the bug** When a worker fails a task will move into `archived` state without retrying if `asynq.MaxRetries(0)` is set on the task. Setting it to any positive value causes allows X worker failures. IE setting `asynq.MaxRetries(3)` will cause allow the task to get picked up again up to 3 times with worker failures as the cause **Environment (please complete the following information):** - OS: linux - `asynq` package version: 0.25.1 - Redis version: 7.4.2 **To Reproduce** Steps to reproduce the behavior (Code snippets if applicable): 1. create task with asynq.MaxRetries(0) 2. run task 3. kill worker before tasks completes (IE ctrl + c) 4. check task state in redis `hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96"` 5. start worker backup 6. after lease expires, task will move to `archived` state (can check with hgetall) **Expected behavior** tasks are picked back up on worker failure. retries is incremented only upon a task returning an error or panic **Screenshots** output from redis cli (payloads and job name redacted) ``` 127.0.0.1:6379> hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96" 1) "state" 2) "active" 3) "msg" 4) "\n\x1finternal:myjobname\x12\xc2\x02taskpayloadhere\x1a$8abc3be7-2f6f-4ac4-a610-1c0a23188d96\"\adefault@\x88\x0e`\x80\xa3\x05" 127.0.0.1:6379> hgetall "asynq:{default}:t:8abc3be7-2f6f-4ac4-a610-1c0a23188d96" 1) "state" 2) "archived" 3) "msg" 4) "\n\x1finternal:myjobname\x12\xc2\x02taskpayloadhere\x1a$8abc3be7-2f6f-4ac4-a610-1c0a23188d96\"\adefault:\x19asynq: task lease expired@\x88\x0eX\xa6\xe7\xe6\xbc\x06`\x80\xa3\x05" ``` **Additional context** For now i am bumping the max retry value to be able to handle this, but I was expecting worker failures to not impact retries
Author
Owner

@kamikazechaser commented on GitHub (May 15, 2025):

https://github.com/hibiken/asynq/blob/master/recoverer.go#L99

Possibly related to the above.

asynq.MaxRetries(0)

Looks like an edge case. We could handle this.

<!-- gh-comment-id:2882815377 --> @kamikazechaser commented on GitHub (May 15, 2025): https://github.com/hibiken/asynq/blob/master/recoverer.go#L99 Possibly related to the above. > asynq.MaxRetries(0) Looks like an edge case. We could handle this.
Author
Owner

@kamikazechaser commented on GitHub (May 15, 2025):

@kmcgovern-apixio Try using the sohail/recoverer-fix branch and see if it fixes your issue.

<!-- gh-comment-id:2882837764 --> @kamikazechaser commented on GitHub (May 15, 2025): @kmcgovern-apixio Try using the sohail/recoverer-fix branch and see if it fixes your issue.
Author
Owner

@dmitrii-doronin commented on GitHub (Dec 29, 2025):

@kamikazechaser, hi. Thank you for taking a look at the issue. Appreciate it.

It seems that the proposed solution would lead to worker failures to always be retried. What's your opinion on introducing isRetryableFunc or something similar to allow for more granular control here? It might not even be a breaking change if the library provides a default retry function through the options.

If there's a more appropriate way to handle the case in this thread, I would be really glad if you could point it out.

<!-- gh-comment-id:3696655689 --> @dmitrii-doronin commented on GitHub (Dec 29, 2025): @kamikazechaser, hi. Thank you for taking a look at the issue. Appreciate it. It seems that the proposed solution would lead to worker failures to always be retried. What's your opinion on introducing `isRetryableFunc` or something similar to allow for more granular control here? It might not even be a breaking change if the library provides a default retry function through the options. If there's a more appropriate way to handle the case in this thread, I would be really glad if you could point it out.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asynq#2510
No description provided.