mirror of
https://github.com/hibiken/asynq.git
synced 2026-04-26 07:25:56 +03:00
[GH-ISSUE #642] [BUG] Successful task is being re-run because of a "context deadline exceeded" error #318
Labels
No labels
CLI
bug
designing
documentation
duplicate
enhancement
good first issue
good first issue
help wanted
idea
invalid
investigate
needs-more-info
performance
pr-welcome
pull-request
question
wontfix
work in progress
work in progress
work-around-available
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/asynq#318
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nikhilsaraf on GitHub (Apr 10, 2023).
Original GitHub issue: https://github.com/hibiken/asynq/issues/642
Originally assigned to: @hibiken on GitHub.
Describe the bug
I'm not sure what the bug is, but can answer more questions or share information as needed.
If there are any logs I can look at to help understand when a task is completed and when the context deadline may have been exceeded I'm happy to share those
To Reproduce
Steps to reproduce the behavior (Code snippets if applicable):
Also, this has never happened before for me even with the same workflow and nothing has changed in my workflow
Is it possible that I have a memory leak somewhere which is keeping my thread active?
Expected behavior
Message should have been run exactly once and should have ended when my handler function finished.
Screenshots
Environment (please complete the following information):
asynqpackage = v0.24.0Additional context
First task (started at 20230410T131137UTC):
There is no log line for
wrote task result for TaskID, which is probably why we retried it (even though it succeeded)?Not sure why the code to execute
wrote task result for TaskIDwas not run (see below), since that's never happened before. I think that has something to do with the context deadline exceeded which "cut off" the message's time.I'm thinking may it did not execute because the system was busy and the thread was paused and didn't get enough CPU time?
Second task (started at 20230410T134157UTC)
I am currently calling
writeOutputToResultWriterin adeferstatement, which never got executed for some reason. This is the function that writes thewrote task result for TaskIDlog line.The solution I am thinking now is to remove all my defer statements as I was expecting the defer to be called no matter what happened, but that doesn't seem to be the case here
Please also note that I have a
panicstatement in one my defer calls that enqueues more messages and would panic if that failed. I can't access the log line for that since I run the queue server in ascreencall on linux and can't scroll back up to see any panic statement that may have been triggered. The only plausible explanation is that the enqueue failed and caused the panic call to get run, which caused my task to be re-executed. So then that leaves the question of whether there is any way for me to find out why thepanicwas called when enqueuing the subsequent messages from inside the taskHandler. I am updating my code to be a little more debuggable but wondering if you have any advice on where to go from here on this specific issue@nikhilsaraf commented on GitHub (Apr 10, 2023):
The larger question I'm trying to understand is why the second (retry) task started at
2023/04/10T13:41:57.708944UTCeven though the first task was still running at2023/04/10T13:53:57.043366UTCIs there something I'm doing wrong that is causing the same task to be run twice concurrently?
@bfwbbrj commented on GitHub (Apr 11, 2023):
It's probably caused by your code running longer than usual. Currently,
ProcessTaskhas a SOFT EXECUTION TIME LIMIT of30 secondsand is checked&refreshed every5 seconds, and a HARD EXECUTION TIME LIMIT default to30 secondsand can be overridden by settingTimeout/Deadlineor both in your Msg. Note that if neither was set, the30 secondsHARD LIMIT will be used and this should be the reason you are seeing thecontext deadline exceedederror. I suggest you set one of them to a large enough value to avoid this kind of issue, you can find the docs here.You should be able to use
deferandrecoverto recover yourpanicand log relevant infos, you can find more in this blog.P.S. Note that asynq only guarantees
at least oncemessage delivery, so it'd be better to make sure all your tasks are idempotent.@nikhilsaraf commented on GitHub (Apr 11, 2023):
The soft and hard execution time limits are very helpful, thank you.
The first task that failed ran for 30 minutes
I will of course restructure the task to be run with in the deadline time, however, I’m now wondering why it didn’t fail much sooner because of the context deadline exceeded?
(see timings in the log lines posted above)
Note: I don’t sent any deadlines today
@bfwbbrj commented on GitHub (Apr 11, 2023):
I made a typo, the default timeout is 30 minutes. And this should explain why it didn't fail much sooner.
Actually, you don't have to. You can choose to set a larger timeout value to avoid task failures due to timeouts. (Though task restructure works as well ) :)
@nikhilsaraf commented on GitHub (Apr 11, 2023):
Ok great, it all makes sense now. Thank you for addressing all my questions!