[GH-ISSUE #717] [QUESTION] Could multiple worker instances cause task duplication in the queue due to the recovery mechanism? #346

Open
opened 2026-03-02 05:20:37 +03:00 by kerem · 4 comments
Owner

Originally created by @WSUFan on GitHub (Aug 6, 2023).
Original GitHub issue: https://github.com/hibiken/asynq/issues/717

Hi @hibiken. I have a follow-up question regarding running multiple instances of workers that serve the same queue. As we have a recovery mechanism now, could there be a possibility that several recovery processes pick up and re-enqueue the same lease expired task multiple times (so the queue will have multiple entries of the same task ID)?

Originally created by @WSUFan on GitHub (Aug 6, 2023). Original GitHub issue: https://github.com/hibiken/asynq/issues/717 Hi @hibiken. I have a follow-up question regarding running multiple instances of workers that serve the same queue. As we have a recovery mechanism now, could there be a possibility that several recovery processes pick up and re-enqueue the same lease expired task multiple times (so the queue will have multiple entries of the same task ID)?
Author
Owner

@hibiken commented on GitHub (Aug 6, 2023):

@WSUFan thanks for pointing this out.

I believe you're right, this needs to be fixed.
Approaches I can think of:

  1. take a lock on these redis keys (asynq:{}:lease) when running listLeaseExpired command (lock with TTL to avoid unfortunate situation)
  2. within listLeaseExpired, we could move these expired task ids to "stagin" area so that they are not in the asynq:{}:lease anymore

Any feedback is appreicated.

<!-- gh-comment-id:1666884588 --> @hibiken commented on GitHub (Aug 6, 2023): @WSUFan thanks for pointing this out. I believe you're right, this needs to be fixed. Approaches I can think of: 1) take a lock on these redis keys (asynq:{<qname>}:lease) when running `listLeaseExpired` command (lock with TTL to avoid unfortunate situation) 2) within `listLeaseExpired`, we could move these expired task ids to "stagin" area so that they are not in the asynq:{<qname>}:lease anymore Any feedback is appreicated.
Author
Owner

@WSUFan commented on GitHub (Aug 6, 2023):

yeah, maybe a redis lock is needed when operating with these keys. I would choose approach 1, haha

<!-- gh-comment-id:1666921509 --> @WSUFan commented on GitHub (Aug 6, 2023): yeah, maybe a redis lock is needed when operating with these keys. I would choose approach 1, haha
Author
Owner

@yousifh commented on GitHub (Aug 7, 2023):

While there is a race condition, is there an actual risk of task duplication?

Take 2 workers W1 and W2 and an expired task T1 and consider this timeline of sequential events

  • W1 calls ListLeaseExpired and gets T1
  • W2 calls ListLeaseExpired and gets T1
  • W1 calls retry It removes T1 from asynq:{<qname>}:active and asynq:{<qname>}:lease and moves it to asynq:{<qname>}:retry
  • W2 calls retry but it will fail because calling LREM on asynq:{<qname>}:active return 0 elements.

Now of course both W1 and W2 having the same expired tasks is not ideal and there is a race condition where in the last step above, W2 might LREM task T1 after it has been re-enqueued again.

So I think your proposal of a Redis lock around listLeaseExpired is good solution so each worker is dealing with disjoint set of expired tasks

<!-- gh-comment-id:1667980849 --> @yousifh commented on GitHub (Aug 7, 2023): While there is a race condition, is there an actual risk of task duplication? Take 2 workers W1 and W2 and an expired task T1 and consider this timeline of sequential events - W1 calls `ListLeaseExpired` and gets T1 - W2 calls `ListLeaseExpired` and gets T1 - W1 calls `retry` It removes T1 from `asynq:{<qname>}:active` and `asynq:{<qname>}:lease` and moves it to `asynq:{<qname>}:retry` - W2 calls `retry` but it will fail because calling `LREM` on `asynq:{<qname>}:active` return 0 elements. Now of course both W1 and W2 having the same expired tasks is not ideal and there is a race condition where in the last step above, W2 might `LREM` task T1 after it has been re-enqueued again. So I think your proposal of a Redis lock around `listLeaseExpired` is good solution so each worker is dealing with disjoint set of expired tasks
Author
Owner

@WSUFan commented on GitHub (Aug 7, 2023):

I believe another solution would be to have the client side manage this. We could execute the listLeaseExpired() function on the client side and retrieve the task initiated by this client. In this scenario, there would only be one retry (re-enqueuing the same task) from this client, assuming each task has a unique ID.

<!-- gh-comment-id:1668182352 --> @WSUFan commented on GitHub (Aug 7, 2023): I believe another solution would be to have the client side manage this. We could execute the ```listLeaseExpired()``` function on the client side and retrieve the task initiated by this client. In this scenario, there would only be one retry (re-enqueuing the same task) from this client, assuming each task has a unique ID.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asynq#346
No description provided.