[GH-ISSUE #329] [Question] Are there TTL set for tasks? How long does it stay in Redis? #2168

Closed
opened 2026-03-15 19:31:26 +03:00 by kerem · 13 comments
Owner

Originally created by @seanyu4296 on GitHub (Sep 23, 2021).
Original GitHub issue: https://github.com/hibiken/asynq/issues/329

Originally assigned to: @hibiken on GitHub.

Asking because i cannot find this in the wiki and others might find it useful.

Specifically, How long do archived tasks stay in Redis so we can keep track or retry manually?

Originally created by @seanyu4296 on GitHub (Sep 23, 2021). Original GitHub issue: https://github.com/hibiken/asynq/issues/329 Originally assigned to: @hibiken on GitHub. Asking because i cannot find this in the wiki and others might find it useful. Specifically, How long do archived tasks stay in Redis so we can keep track or retry manually?
kerem 2026-03-15 19:31:26 +03:00
Author
Owner

@hibiken commented on GitHub (Sep 23, 2021):

@seanyu4296 Thank you for the question! I should document this somewhere.

Currently these are defined here as constants:

github.com/hibiken/asynq@b3ef9e91a9/internal/rdb/rdb.go (L528-L531)

In other words, tasks archived 90days ago (or older) will be deleted. If the archive size reaches 10,000, Asynq will start deleting oldest tasks to keep the size at 10,000 or smaller (even if the oldest tasks are less than 90d old).

Let me know if you have more questions on this :)

<!-- gh-comment-id:926000118 --> @hibiken commented on GitHub (Sep 23, 2021): @seanyu4296 Thank you for the question! I should document this somewhere. Currently these are defined here as constants: https://github.com/hibiken/asynq/blob/b3ef9e91a9cec1f9890f3f34653aaf6b90388dbd/internal/rdb/rdb.go#L528-L531 In other words, tasks archived 90days ago (or older) will be deleted. If the archive size reaches 10,000, Asynq will start deleting oldest tasks to keep the size at 10,000 or smaller (even if the oldest tasks are less than 90d old). Let me know if you have more questions on this :)
Author
Owner

@seanyu4296 commented on GitHub (Sep 24, 2021):

Thank you again so much for the swift and detailed response @hibiken

From the code now, there is no way to override it right? correct me if I'm wrong?

<!-- gh-comment-id:926461072 --> @seanyu4296 commented on GitHub (Sep 24, 2021): Thank you again so much for the swift and detailed response @hibiken From the code now, there is no way to override it right? correct me if I'm wrong?
Author
Owner

@seanyu4296 commented on GitHub (Sep 24, 2021):

  1. Another follow up question how often does the worker poll to get new tasks mainly for "scheduled" tasks? I'm looking through the code now and i see there is a process to move from scheduled -> pending in forward.go? What's next after that?

  2. Is the execution accurate down to the Milliseconds?

Let's say I enqueued a task at 00:00:00.000 and scheduled it to process after a minute
asynq.enqueue(..., asynq.ProcessIn(time.Minute * 1)

Is there a guarantee that it will execute after 00:01:00.000 ? or is there a possibility for it to run a bit early? because of some polling and zscore comparison?

<!-- gh-comment-id:926559452 --> @seanyu4296 commented on GitHub (Sep 24, 2021): 1. Another follow up question how often does the worker poll to get new tasks mainly for "scheduled" tasks? I'm looking through the code now and i see there is a process to move from scheduled -> pending in forward.go? What's next after that? 2. Is the execution accurate down to the Milliseconds? Let's say I enqueued a task at 00:00:00.000 and scheduled it to process after a minute `asynq.enqueue(..., asynq.ProcessIn(time.Minute * 1)` Is there a guarantee that it will execute after 00:01:00.000 ? or is there a possibility for it to run a bit early? because of some polling and zscore comparison?
Author
Owner

@hibiken commented on GitHub (Sep 30, 2021):

@seanyu4296 Sorry for the delayed response.

Yes, you're right. Currently, user cannot configure archiveMaxSize and archivedExpirationInDays. Let me know if you have a need to configure these.

As for question #1, after task goes to pending it's FIFO from there so all available worker will look at the pending list and start processing in FIFO fashion.

To answer question #2, no it's not. It's at granularity of seconds. It'll always be no earlier than the time specified by ProcessIn or ProcessAt. It depends on how backed up the pending list is, and also cadence of forwarder (currently every 5s)

These are all good questions, let me know if you have more!

<!-- gh-comment-id:931342448 --> @hibiken commented on GitHub (Sep 30, 2021): @seanyu4296 Sorry for the delayed response. Yes, you're right. Currently, user cannot configure `archiveMaxSize` and `archivedExpirationInDays`. Let me know if you have a need to configure these. As for question #1, after task goes to `pending` it's FIFO from there so all available worker will look at the pending list and start processing in FIFO fashion. To answer question #2, no it's not. It's at granularity of seconds. It'll always be *no earlier* than the time specified by `ProcessIn` or `ProcessAt`. It depends on how backed up the pending list is, and also cadence of forwarder (currently every 5s) These are all good questions, let me know if you have more!
Author
Owner

@seanyu4296 commented on GitHub (Sep 30, 2021):

Yes, you're right. Currently, user cannot configure archiveMaxSize and archivedExpirationInDays. Let me know if you have a need to configure these.

So far, there is no need but it would be nice to have.

Thanks for answering my questions. Sorry, also didn't get back here after I looked through the code. How I figured it out is starting in rdb.go and going up. (e.g. start with the function dequeue and check where it is used)

Anyway, I noticed now we use RPOPLPUSH instead of BRPOPLPUSH. What's the reason behind that? Is it because of lua script being blocking by default?

On Task Execution Timing

So, the forwarder only polls and forwards every 5 seconds code and the processor "pulls" every second depending on the pending tasks based from this code. Correct me if I am wrong? or if you have further thoughts

Does that mean when I enqueue a task at 00:00:02.000 to be processed after a second and I started my asynq server at 00:00:00.000 the task will roughly get processed at 00:00:05.000?

Thanks so much too @hibiken for answering and sharing your knowledge! 🙏

<!-- gh-comment-id:931371999 --> @seanyu4296 commented on GitHub (Sep 30, 2021): > Yes, you're right. Currently, user cannot configure archiveMaxSize and archivedExpirationInDays. Let me know if you have a need to configure these. So far, there is no need but it would be nice to have. Thanks for answering my questions. Sorry, also didn't get back here after I looked through the code. How I figured it out is starting in rdb.go and going up. (e.g. start with the function `dequeue` and check where it is used) Anyway, I noticed now we use RPOPLPUSH instead of BRPOPLPUSH. What's the reason behind that? Is it because of lua script being blocking by default? ### On Task Execution Timing So, the `forwarder` only polls and forwards every 5 seconds [code](https://github.com/hibiken/asynq/blob/b3ef9e91a9cec1f9890f3f34653aaf6b90388dbd/forwarder.go#L64-L66) and the `processor` "pulls" every second depending on the pending tasks based from this [code](https://github.com/hibiken/asynq/blob/b3ef9e91a9cec1f9890f3f34653aaf6b90388dbd/processor.go#L174-L174). Correct me if I am wrong? or if you have further thoughts Does that mean when I enqueue a task at 00:00:02.000 to be processed after a second and I started my asynq server at 00:00:00.000 the task will roughly get processed at 00:00:05.000? Thanks so much too @hibiken for answering and sharing your knowledge! 🙏
Author
Owner

@hibiken commented on GitHub (Oct 1, 2021):

So far, there is no need but it would be nice to have.

Got it, yes we can make this configurable. I'll defer it for now, I don't want to add a feature not used by anyone :)

I noticed now we use RPOPLPUSH instead of BRPOPLPUSH. What's the reason behind that? Is it because of lua script being blocking by default?

It's beacuase we want to consult multiple queues in order (based on priority) for any pending task but don't want to get blocked by an empty queue. For example, let's say our server is consuming tasks from Queue X, Y, and Z, and based on the priority configuration we should consult them in X, Y, Z order (i.e. priority level X > Y > Z).
If Queue X, Y are empty, but Queue Z has some pending tasks, we want to process those tasks. That's why we want the non-blocking RPOPLPUSH instead of the blocking variant. It'd be nice if Redis supported multiple sources for BRPOPLPUSH but not the case right now: Related issue https://github.com/redis/redis/issues/1785

So, the forwarder only polls and forwards every 5 seconds and the processor "pulls" every second depending on the pending tasks based from this. Correct me if I am wrong? or if you have further thoughts

This one second sleep only happens when all of the queues are empty. To use the previous example, when all Queue X, Y, Z are empty. Since we are basically polling redis for any pending tasks, if we don't sleep there, it'll be slamming redis in an infinite loop, adding unnecessary load to redis (it'll also be bad since other Lua scripts can't execute, since all redis command/scripts are execute in a single-threaded fashion)
github.com/hibiken/asynq@b3ef9e91a9/processor.go (L174-L174)

The forwarder moves tasks from scheduled or retry state to pending in a batch fashion every 5s. So if a task A is enqueued with ProcessAt(T1), it'll only get forwarded to pending state after time T1. But that should happen anywhere between T1 and (T1 + 5s). So 5s is the maximum delay in terms of when the task becomes pending. Once a task is pending, it's a matter of how much backlog the queue has in the pending set. Ideally pending set is small so that as soon as tasks become pending, they get picked up by a worker (state becomes active), but if you have a large backlog in the pending set, then, since tasks are processed in FIFO order, it'll take some time before the task get picked up by a worker (Incidentally, we have a feature request to surface this queue "latency" in our tooling)

If you haven't already, please take a look at this wiki: https://github.com/hibiken/asynq/wiki/Life-of-a-Task and suggest any modification or make edits if you find some information is missing there.

<!-- gh-comment-id:932221316 --> @hibiken commented on GitHub (Oct 1, 2021): > So far, there is no need but it would be nice to have. Got it, yes we can make this configurable. I'll defer it for now, I don't want to add a feature not used by anyone :) > I noticed now we use RPOPLPUSH instead of BRPOPLPUSH. What's the reason behind that? Is it because of lua script being blocking by default? It's beacuase we want to consult multiple queues in order (based on priority) for any pending task but don't want to get blocked by an empty queue. For example, let's say our server is consuming tasks from Queue X, Y, and Z, and based on the priority configuration we should consult them in X, Y, Z order (i.e. priority level X > Y > Z). If Queue X, Y are empty, but Queue Z has some pending tasks, we want to process those tasks. That's why we want the non-blocking `RPOPLPUSH` instead of the blocking variant. It'd be nice if Redis supported multiple sources for `BRPOPLPUSH` but not the case right now: Related issue https://github.com/redis/redis/issues/1785 > So, the forwarder only polls and forwards every 5 seconds and the processor "pulls" every second depending on the pending tasks based from this. Correct me if I am wrong? or if you have further thoughts This one second sleep only happens when all of the queues are empty. To use the previous example, when all Queue X, Y, Z are empty. Since we are basically polling redis for any pending tasks, if we don't sleep there, it'll be slamming redis in an infinite loop, adding unnecessary load to redis (it'll also be bad since other Lua scripts can't execute, since all redis command/scripts are execute in a single-threaded fashion) https://github.com/hibiken/asynq/blob/b3ef9e91a9cec1f9890f3f34653aaf6b90388dbd/processor.go#L174-L174 The forwarder moves tasks from `scheduled` or `retry` state to `pending` in a batch fashion every 5s. So if a task A is enqueued with `ProcessAt(T1)`, it'll only get forwarded to `pending` state after time T1. But that should happen anywhere between T1 and (T1 + 5s). So 5s is the maximum delay in terms of when the task becomes pending. Once a task is pending, it's a matter of how much backlog the queue has in the pending set. Ideally pending set is small so that as soon as tasks become pending, they get picked up by a worker (state becomes active), but if you have a large backlog in the pending set, then, since tasks are processed in FIFO order, it'll take some time before the task get picked up by a worker (Incidentally, we have a feature request to surface this queue "latency" in our tooling) If you haven't already, please take a look at this wiki: https://github.com/hibiken/asynq/wiki/Life-of-a-Task and suggest any modification or make edits if you find some information is missing there.
Author
Owner

@hibiken commented on GitHub (Oct 12, 2021):

Closing this. Let me know if you have any more questions!

<!-- gh-comment-id:941045598 --> @hibiken commented on GitHub (Oct 12, 2021): Closing this. Let me know if you have any more questions!
Author
Owner

@csdenboer commented on GitHub (Oct 12, 2021):

First of all, thanks a lot for your work on asynq! I was searching for the reason why there's a lot of latency between a task being enqueued and being received by a worker. Apparently, this is necessary to support multiple (prioritized) queues. However, I think there are also many users that are just using a single queue (like me) and for those I believe the performance penalty is huge. Is there a way to use BRPOPLPUSH in case of only a single queue?

<!-- gh-comment-id:941084577 --> @csdenboer commented on GitHub (Oct 12, 2021): First of all, thanks a lot for your work on asynq! I was searching for the reason why there's a lot of latency between a task being enqueued and being received by a worker. Apparently, this is necessary to support multiple (prioritized) queues. However, I think there are also many users that are just using a single queue (like me) and for those I believe the performance penalty is huge. Is there a way to use BRPOPLPUSH in case of only a single queue?
Author
Owner

@hibiken commented on GitHub (Oct 12, 2021):

@csdenboer thanks for the comment and good observation!
I vaguely remember that we used to use BRPOPLPUSH if user only specified one queue in the Config. I'll look into to this and see if we can conditionally run BRPOPLPUSH in a single queue case.

Just to get more context, when are you noticing a high latency. Is it when enqueuing with either ProcessIn or ProcessAt option? Or without them?
Off the top of my head, the delay should be no more than one second unless the queue has a huge backlog of tasks.

<!-- gh-comment-id:941644108 --> @hibiken commented on GitHub (Oct 12, 2021): @csdenboer thanks for the comment and good observation! I vaguely remember that we used to use BRPOPLPUSH if user only specified one queue in the Config. I'll look into to this and see if we can conditionally run BRPOPLPUSH in a single queue case. Just to get more context, when are you noticing a high latency. Is it when enqueuing with either `ProcessIn` or `ProcessAt` option? Or without them? Off the top of my head, the delay should be no more than one second unless the queue has a huge backlog of tasks.
Author
Owner

@csdenboer commented on GitHub (Oct 13, 2021):

Thanks! Yes, we are experiencing high latency when enqueuing without the ProcessIn or ProcessAt option. The delay seems to be at most 1 second indeed, but that is very significant for our use case… In my experience when using BRPOPLPUSH the latency should be somewhere in the order of milliseconds. It would be very nice if this can be resolved!

<!-- gh-comment-id:941949752 --> @csdenboer commented on GitHub (Oct 13, 2021): Thanks! Yes, we are experiencing high latency when enqueuing without the ProcessIn or ProcessAt option. The delay seems to be at most 1 second indeed, but that is very significant for our use case… In my experience when using BRPOPLPUSH the latency should be somewhere in the order of milliseconds. It would be very nice if this can be resolved!
Author
Owner

@xuyang2 commented on GitHub (Jul 13, 2022):

@hibiken It seems archiveCmd only sets the expiry time for KEYS[5] and KEYS[6], not the expiry time for KEYS[1] .

Is there any other cleanup routine for KEYS[1] -> asynq:{<qname>}:t:<task_id> ?

github.com/hibiken/asynq@c70ff6a335/internal/rdb/rdb.go (L824-L867)

<!-- gh-comment-id:1182890036 --> @xuyang2 commented on GitHub (Jul 13, 2022): @hibiken It seems `archiveCmd` only sets the expiry time for `KEYS[5]` and `KEYS[6]`, not the expiry time for `KEYS[1]` . Is there any other cleanup routine for `KEYS[1] -> asynq:{<qname>}:t:<task_id>` ? https://github.com/hibiken/asynq/blob/c70ff6a335eed05735427567e8e9549745f43204/internal/rdb/rdb.go#L824-L867
Author
Owner

@roccoblues commented on GitHub (Jul 26, 2022):

So far, there is no need but it would be nice to have.

Got it, yes we can make this configurable. I'll defer it for now, I don't want to add a feature not used by anyone :)

I found this thread because we actually need it. 😏 The default retention period is too long for the German data-privacy laws and we need to expire tasks sooner.

Would you be open for a PR that makes the two settings configurable?

<!-- gh-comment-id:1195356866 --> @roccoblues commented on GitHub (Jul 26, 2022): > > So far, there is no need but it would be nice to have. > Got it, yes we can make this configurable. I'll defer it for now, I don't want to add a feature not used by anyone :) I found this thread because we actually need it. 😏 The default retention period is too long for the German data-privacy laws and we need to expire tasks sooner. Would you be open for a PR that makes the two settings configurable?
Author
Owner

@hibiken commented on GitHub (Jul 27, 2022):

@roccoblues Sure, I'm a little behind on PR reviews but I'll get to it when I have time.
Please take a look at CONTRIBUTION.md before creating a PR 👍 Thanks!

<!-- gh-comment-id:1196180773 --> @hibiken commented on GitHub (Jul 27, 2022): @roccoblues Sure, I'm a little behind on PR reviews but I'll get to it when I have time. Please take a look at CONTRIBUTION.md before creating a PR 👍 Thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asynq#2168
No description provided.