[GH-ISSUE #420] [BUG] Task stuck in active state forever #2210

Open
opened 2026-03-15 19:41:38 +03:00 by kerem · 15 comments
Owner

Originally created by @mailbaoer on GitHub (Mar 17, 2022).
Original GitHub issue: https://github.com/hibiken/asynq/issues/420

Originally assigned to: @hibiken on GitHub.

Describe the bug
I have some tasks not set timeout, for a long time run, I found some tasks are always in running state, I tried to cancel them in web ui or cli, they can't be canceled, and the state changed to running after canceling

To Reproduce
Steps to reproduce the behavior (Code snippets if applicable):
sorry, I don't know how reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
screenshot-20220317-101327

Environment (please complete the following information):

  • OS: [e.g. MacOS, Linux]
  • Version of asynq package [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

Originally created by @mailbaoer on GitHub (Mar 17, 2022). Original GitHub issue: https://github.com/hibiken/asynq/issues/420 Originally assigned to: @hibiken on GitHub. **Describe the bug** I have some tasks not set timeout, for a long time run, I found some tasks are always in running state, I tried to cancel them in web ui or cli, they can't be canceled, and the state changed to running after canceling **To Reproduce** Steps to reproduce the behavior (Code snippets if applicable): sorry, I don't know how reproduce **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** ![screenshot-20220317-101327](https://user-images.githubusercontent.com/5282978/158722979-f6f28b84-5c1c-468a-8fed-036338cea7eb.png) **Environment (please complete the following information):** - OS: [e.g. MacOS, Linux] - Version of `asynq` package [e.g. v1.0.0] **Additional context** Add any other context about the problem here.
Author
Owner

@hibiken commented on GitHub (Mar 17, 2022):

@mailbaoer Thank you for opening an issue!

Would you mind providing the version of asynq package you are using :)

<!-- gh-comment-id:1070089911 --> @hibiken commented on GitHub (Mar 17, 2022): @mailbaoer Thank you for opening an issue! Would you mind providing the version of asynq package you are using :)
Author
Owner

@mailbaoer commented on GitHub (Mar 17, 2022):

I'm use 0.22.1 now, but this bug may exists before this version, I've see this in other versions, maybe 0.18 for my first time use asynq

<!-- gh-comment-id:1070543649 --> @mailbaoer commented on GitHub (Mar 17, 2022): I'm use 0.22.1 now, but this bug may exists before this version, I've see this in other versions, maybe 0.18 for my first time use asynq
Author
Owner

@hibiken commented on GitHub (Mar 17, 2022):

I see.

We've made some improvements around orphaned task recovery in v0.22. If you are using latest version of Web UI (v0.6.0), you'll see the status of the tasks will show "Orphaned". This happens when a worker start working on a task but crashes before completing the processing.

If you run a server against the same queue, they'll be recovered automatically after some time period (i.e. after a few heartbeat misses). Once the task is orphaned, they are no longer cancelable (the latest web UI will disable the cancel button)

Follow up questions:

  • Are you running a server against the queue these tasks are in?
  • Would you mind running this redis command to see what's in the lease set? (ZRANGE asynq:{default}:lease 0 -1 WITHSCORES)
<!-- gh-comment-id:1070888928 --> @hibiken commented on GitHub (Mar 17, 2022): I see. We've made some improvements around orphaned task recovery in v0.22. If you are using latest version of Web UI (v0.6.0), you'll see the status of the tasks will show "Orphaned". This happens when a worker start working on a task but crashes before completing the processing. If you run a server against the same queue, they'll be recovered automatically after some time period (i.e. after a few heartbeat misses). Once the task is orphaned, they are no longer cancelable (the latest web UI will disable the cancel button) Follow up questions: - Are you running a server against the queue these tasks are in? - Would you mind running this redis command to see what's in the lease set? (`ZRANGE asynq:{default}:lease 0 -1 WITHSCORES`)
Author
Owner

@mailbaoer commented on GitHub (Mar 18, 2022):

  1. Yes, I started a worker service on my server only run tasks for background, but NewxxxTask run with another api service to add tasks
  2. I've run this command,but does not see anything
    screenshot-20220318-103721
<!-- gh-comment-id:1071969094 --> @mailbaoer commented on GitHub (Mar 18, 2022): 1. Yes, I started a worker service on my server only run tasks for background, but NewxxxTask run with another api service to add tasks 2. I've run this command,but does not see anything ![screenshot-20220318-103721](https://user-images.githubusercontent.com/5282978/158927018-6fffe5ba-3d3b-4f1f-b874-2f01d5659a73.png)
Author
Owner

@hibiken commented on GitHub (Mar 18, 2022):

Ok, thanks for providing that info.

Would you mind running this command:

ZRANGE asynq:{default}:deadlines 0 -1 WITHSCORES
<!-- gh-comment-id:1071995010 --> @hibiken commented on GitHub (Mar 18, 2022): Ok, thanks for providing that info. Would you mind running this command: ``` ZRANGE asynq:{default}:deadlines 0 -1 WITHSCORES ```
Author
Owner

@mailbaoer commented on GitHub (Mar 18, 2022):

image
all the tasks stays in running for a lot of days, It's still running now, and I'm upgrade asynqmon to 0.6.1

screenshot-20220318-120234

<!-- gh-comment-id:1072009750 --> @mailbaoer commented on GitHub (Mar 18, 2022): ![image](https://user-images.githubusercontent.com/5282978/158934911-c7011ee4-b83d-4b6c-91d2-cecb6813b84f.png) all the tasks stays in running for a lot of days, It's still running now, and I'm upgrade asynqmon to 0.6.1 ![screenshot-20220318-120234](https://user-images.githubusercontent.com/5282978/158935252-716d1eb3-c2c8-4eec-9659-05f7d563a7be.png)
Author
Owner

@hibiken commented on GitHub (Mar 18, 2022):

That's very strange. I thought you'd have entries in either asynq:{default}:deadlines (used by v0.21.x or below) or asynq:{default}:lease (used by v0.22.x). These zsets are used to recover orphaned tasks in case of worker crash, but the fact that there's no entries there seem something unexpected happened.
I'll keep this bug open to see if others have encountered similar issue and get more context.

Please let me know if you can reproduce this, I'd like to know how to reproduce this bug.


If you need to address this manually, you can get a list of "active" tasks and put their IDs back in the pending list (note: the IDs you see in the image above is just a prefix, so make sure to click into each row to get the full ID)
Once you have the IDs, you can

  1. delete them from the active list (LREM asynq:{default}:active 1 <task_id>
  2. put them back on the pending list LPUSH asynq:{default}:pending <task_id>
<!-- gh-comment-id:1072384232 --> @hibiken commented on GitHub (Mar 18, 2022): That's very strange. I thought you'd have entries in either `asynq:{default}:deadlines` (used by v0.21.x or below) or `asynq:{default}:lease` (used by v0.22.x). These zsets are used to recover orphaned tasks in case of worker crash, but the fact that there's no entries there seem something unexpected happened. I'll keep this bug open to see if others have encountered similar issue and get more context. Please let me know if you can reproduce this, I'd like to know how to reproduce this bug. --- If you need to address this manually, you can get a list of "active" tasks and put their IDs back in the pending list (note: the IDs you see in the image above is just a prefix, so make sure to click into each row to get the full ID) Once you have the IDs, you can 1) delete them from the active list (`LREM asynq:{default}:active 1 <task_id>` 2) put them back on the pending list `LPUSH asynq:{default}:pending <task_id>`
Author
Owner

@mailbaoer commented on GitHub (Mar 21, 2022):

Thank you very much for your patience in answering, if I encounter this problem again I will check to see if I can reproduce it

<!-- gh-comment-id:1073429748 --> @mailbaoer commented on GitHub (Mar 21, 2022): Thank you very much for your patience in answering, if I encounter this problem again I will check to see if I can reproduce it
Author
Owner

@namhq1989 commented on GitHub (Mar 22, 2022):

any update? I met this bug too. asynq v0.22.1, redis v5.0.7

<!-- gh-comment-id:1074847629 --> @namhq1989 commented on GitHub (Mar 22, 2022): any update? I met this bug too. asynq v0.22.1, redis v5.0.7
Author
Owner

@hibiken commented on GitHub (Mar 22, 2022):

@namhq1989 Thanks for the comment. We're looking for a way to reproduce this.

For anyone experienced this bug, please provide the following:

  • If possible, the steps to reproduce this bug.
  • Asynq version
  • Output of the following command (replace the <qname> with the queue name (e.g. asynq:{default}:lease):
    • (for asynq v0.22.x or above): ZRANGE asynq:{<qname>}:lease 0 -1 WITHSCORES
    • (for asynq v0.21.x or below): ZRANGE asynq:{<qname>}:deadlines 0 -1 WITHSCORES
  • Whether the IDs of the orphaned tasks are in the output above
<!-- gh-comment-id:1075146735 --> @hibiken commented on GitHub (Mar 22, 2022): @namhq1989 Thanks for the comment. We're looking for a way to reproduce this. For anyone experienced this bug, please provide the following: - If possible, the steps to reproduce this bug. - Asynq version - Output of the following command (replace the `<qname>` with the queue name (e.g. `asynq:{default}:lease`): - (for asynq v0.22.x or above): `ZRANGE asynq:{<qname>}:lease 0 -1 WITHSCORES` - (for asynq v0.21.x or below): `ZRANGE asynq:{<qname>}:deadlines 0 -1 WITHSCORES` - Whether the IDs of the orphaned tasks are in the output above
Author
Owner

@dokudoki commented on GitHub (Mar 31, 2022):

  • I think I ran into this issue when the worker throw a "fatal error" and exited.
  • Asynq version v0.21.0
 1) "4d841b6c-1de5-4a70-9233-db16a2493831"
 2) "1648764430"
 3) "af985397-6ee3-4f4d-825d-e926a0a6d1cd"
 4) "1648764430"
 5) "10dff61b-9f5c-4d30-a2f6-42cdb9a05f91"
 6) "1648764440"
 7) "45a12b75-0b83-4e2b-bc7c-93357207ed36"
 8) "1648764440"
 9) "d40431d2-e634-4922-acd0-ab435ec6ac47"
10) "1648764440"
11) "2117339c-db7a-4790-a255-7c9ef0a19b74"
12) "1648765083"
13) "5c1fb99c-60ec-49b1-b7d1-04d953c9d9f0"
14) "1648765083"
15) "932ad61e-d2c1-4f7f-8719-9af9db04f37a"
16) "1648765882"
17) "e1ae65d7-978c-4ac0-a152-3312e458df5e"
18) "1648765882"
<!-- gh-comment-id:1085173782 --> @dokudoki commented on GitHub (Mar 31, 2022): - I think I ran into this issue when the worker throw a "fatal error" and exited. - Asynq version v0.21.0 ``` 1) "4d841b6c-1de5-4a70-9233-db16a2493831" 2) "1648764430" 3) "af985397-6ee3-4f4d-825d-e926a0a6d1cd" 4) "1648764430" 5) "10dff61b-9f5c-4d30-a2f6-42cdb9a05f91" 6) "1648764440" 7) "45a12b75-0b83-4e2b-bc7c-93357207ed36" 8) "1648764440" 9) "d40431d2-e634-4922-acd0-ab435ec6ac47" 10) "1648764440" 11) "2117339c-db7a-4790-a255-7c9ef0a19b74" 12) "1648765083" 13) "5c1fb99c-60ec-49b1-b7d1-04d953c9d9f0" 14) "1648765083" 15) "932ad61e-d2c1-4f7f-8719-9af9db04f37a" 16) "1648765882" 17) "e1ae65d7-978c-4ac0-a152-3312e458df5e" 18) "1648765882" ```
Author
Owner

@piperck commented on GitHub (May 4, 2022):

I met this too.. If it happens again, I will try to investigate it.

<!-- gh-comment-id:1117378371 --> @piperck commented on GitHub (May 4, 2022): I met this too.. If it happens again, I will try to investigate it.
Author
Owner

@paveljanda commented on GitHub (May 12, 2022):

Hi, we never had this problem as we never used asynq before. BUT if anyone could describe how to reproduce this bug and it gets fixed/solved, it would help us choosing the right distributed task queue for our projects. Currently choosing from ~5 contestants.

Thanks a lot to everyone!

<!-- gh-comment-id:1124756284 --> @paveljanda commented on GitHub (May 12, 2022): Hi, we never had this problem as we never used asynq before. BUT if anyone could describe how to reproduce this bug and it gets fixed/solved, it would help us choosing the right distributed task queue for our projects. Currently choosing from ~5 contestants. Thanks a lot to everyone!
Author
Owner

@zijiwork commented on GitHub (Jul 15, 2022):

I encountered this problem in the following versions, the task is always in the active state after the worker restarts, and the task cannot be closed

asynq v0.19.0
asynqmon v0.4.0
redis_version 5.0.4
luban:0>ZRANGE asynq:{critical}:deadlines 0 -1 WITHSCORES
1) "bab56af5-f43e-4192-a5f3-b7eb03246b8b"
2) "1657874411"
3) "f559737f-f729-4716-bfc5-ed0fe6a720a4"
4) "1657874530"
5) "3b839bf5-9ba8-43c9-bc21-906a925e945e"
6) "1657875931"
7) "77147b6f-095d-47d6-a2ce-35d3ad3d80e2"
8) "1657875957"
<!-- gh-comment-id:1185309282 --> @zijiwork commented on GitHub (Jul 15, 2022): I encountered this problem in the following versions, the task is always in the active state after the worker restarts, and the task cannot be closed ``` asynq v0.19.0 asynqmon v0.4.0 redis_version 5.0.4 ``` ``` luban:0>ZRANGE asynq:{critical}:deadlines 0 -1 WITHSCORES 1) "bab56af5-f43e-4192-a5f3-b7eb03246b8b" 2) "1657874411" 3) "f559737f-f729-4716-bfc5-ed0fe6a720a4" 4) "1657874530" 5) "3b839bf5-9ba8-43c9-bc21-906a925e945e" 6) "1657875931" 7) "77147b6f-095d-47d6-a2ce-35d3ad3d80e2" 8) "1657875957" ```
Author
Owner

@KrokoYR commented on GitHub (Oct 8, 2024):

We accidentally met this kind of issue. What we did:

  • deployed a new queue service(let's call it NEW) with such setup: redis.Options{Addr: "{addr}", DB: 10}
  • already existing service(let's call it OLD) has the following setup: redis.Options{Addr: "{addr}", DB: 0}
  • both of each services have queue with same name, let's say some_queue

This led to a problem that those services started "stealing" each others tasks. So we made a mistake, that we didn't pass DB number into asynq.RedisClientOpt. Maybe it will help someone

<!-- gh-comment-id:2400412514 --> @KrokoYR commented on GitHub (Oct 8, 2024): We accidentally met this kind of issue. What we did: - deployed a new queue service(let's call it NEW) with such setup: `redis.Options{Addr: "{addr}", DB: 10}` - already existing service(let's call it OLD) has the following setup: `redis.Options{Addr: "{addr}", DB: 0}` - both of each services have queue with same name, let's say `some_queue` This led to a problem that those services started "stealing" each others tasks. So we made a mistake, that we didn't pass DB number into `asynq.RedisClientOpt`. Maybe it will help someone
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asynq#2210
No description provided.