[GH-ISSUE #231] Instability (requiring a daily reboot) since 0.2.20 update (now on 0.2.21) #2088

Closed
opened 2026-03-14 02:25:18 +03:00 by kerem · 8 comments
Owner

Originally created by @rtwright68 on GitHub (Jan 5, 2021).
Original GitHub issue: https://github.com/amidaware/tacticalrmm/issues/231

Running a VMware VM (2 CPUs, 8GB RAM, 250GB storage) on Ubuntu.
475 agents total. Having issues taking control then agents lose contact.
Reboot fixes for about another day (rebooted around the same time yesterday).
CPU & RAM usage look normal. Read/write vdisk latency look good.
df shows:
Filesystem 1K-blocks Used Available Use% Mounted on
udev 4032736 0 4032736 0% /dev
tmpfs 815336 1248 814088 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 127974628 19797204 101633656 17% /
tmpfs 4076660 276 4076384 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4076660 0 4076660 0% /sys/fs/cgroup
/dev/sda2 999320 202092 728416 22% /boot
/dev/loop0 56704 56704 0 100% /snap/core18/1932
/dev/loop1 73088 73088 0 100% /snap/lxd/16099
/dev/loop2 56832 56832 0 100% /snap/core18/1944
/dev/loop4 31872 31872 0 100% /snap/snapd/10707
/dev/loop3 31872 31872 0 100% /snap/snapd/10492
/dev/loop5 69376 69376 0 100% /snap/lxd/18150
tmpfs 815332 0 815332 0% /run/user/1000

Not sure what else to check? Was very rock solid running 0.2.18.

Originally created by @rtwright68 on GitHub (Jan 5, 2021). Original GitHub issue: https://github.com/amidaware/tacticalrmm/issues/231 Running a VMware VM (2 CPUs, 8GB RAM, 250GB storage) on Ubuntu. 475 agents total. Having issues taking control then agents lose contact. Reboot fixes for about another day (rebooted around the same time yesterday). CPU & RAM usage look normal. Read/write vdisk latency look good. df shows: Filesystem 1K-blocks Used Available Use% Mounted on udev 4032736 0 4032736 0% /dev tmpfs 815336 1248 814088 1% /run /dev/mapper/ubuntu--vg-ubuntu--lv 127974628 19797204 101633656 17% / tmpfs 4076660 276 4076384 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 4076660 0 4076660 0% /sys/fs/cgroup /dev/sda2 999320 202092 728416 22% /boot /dev/loop0 56704 56704 0 100% /snap/core18/1932 /dev/loop1 73088 73088 0 100% /snap/lxd/16099 /dev/loop2 56832 56832 0 100% /snap/core18/1944 /dev/loop4 31872 31872 0 100% /snap/snapd/10707 /dev/loop3 31872 31872 0 100% /snap/snapd/10492 /dev/loop5 69376 69376 0 100% /snap/lxd/18150 tmpfs 815332 0 815332 0% /run/user/1000 Not sure what else to check? Was very rock solid running 0.2.18.
kerem closed this issue 2026-03-14 02:25:23 +03:00
Author
Owner

@azulskyknight commented on GitHub (Jan 5, 2021):

Is the only issue with take control? If so, perhaps you should stop rebooting and try the big red recover connection button at the top?

Mesh was updated in 0.2.20, this causes all the mesh agents to update and sometimes they go a bit nuts and Tactical has to track "new" mesh agents down again. If you log into mesh itself and look you'll see duplication of agents when this happens.

But again that's just mesh issues. If the RMM agents themselves are dropping that's a different issue.

<!-- gh-comment-id:754882607 --> @azulskyknight commented on GitHub (Jan 5, 2021): Is the only issue with take control? If so, perhaps you should stop rebooting and try the big red recover connection button at the top? Mesh was updated in 0.2.20, this causes all the mesh agents to update and sometimes they go a bit nuts and Tactical has to track "new" mesh agents down again. If you log into mesh itself and look you'll see duplication of agents when this happens. But again that's just mesh issues. If the RMM agents themselves are dropping that's a different issue.
Author
Owner

@rtwright68 commented on GitHub (Jan 5, 2021):

Definitely tried the recover agent and that has helped in some cases. The odd thing that triggers the need to reboot is the communication between the agents and the server starts dropping, rebooting fixes that.

<!-- gh-comment-id:754884709 --> @rtwright68 commented on GitHub (Jan 5, 2021): Definitely tried the recover agent and that has helped in some cases. The odd thing that triggers the need to reboot is the communication between the agents and the server starts dropping, rebooting fixes that.
Author
Owner

@bbrendon commented on GitHub (Jan 5, 2021):

How many total agents do you have? Have you looked around in /var/log/* for problems?

<!-- gh-comment-id:754927882 --> @bbrendon commented on GitHub (Jan 5, 2021): How many total agents do you have? Have you looked around in /var/log/* for problems?
Author
Owner

@rtwright68 commented on GitHub (Jan 6, 2021):

We now have 478 agents. Will look at the /var/log/ to see if anything is obvious.

<!-- gh-comment-id:755309622 --> @rtwright68 commented on GitHub (Jan 6, 2021): We now have 478 agents. Will look at the /var/log/ to see if anything is obvious.
Author
Owner

@wh1te909 commented on GitHub (Jan 7, 2021):

Definitely tried the recover agent and that has helped in some cases. The odd thing that triggers the need to reboot is the communication between the agents and the server starts dropping, rebooting fixes that.

what exactly is dropping? just mesh communication? or are agents showing offline in tactical UI?
also what model is your CPU? 2 cpu's seems very low for 478 agents

<!-- gh-comment-id:756359747 --> @wh1te909 commented on GitHub (Jan 7, 2021): > Definitely tried the recover agent and that has helped in some cases. The odd thing that triggers the need to reboot is the communication between the agents and the server starts dropping, rebooting fixes that. what exactly is dropping? just mesh communication? or are agents showing offline in tactical UI? also what model is your CPU? 2 cpu's seems very low for 478 agents
Author
Owner

@rtwright68 commented on GitHub (Jan 7, 2021):

We have seen both. Its a VMware VM so I will boost up the CPU count. The CPUs on the ESXi hosts are: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

<!-- gh-comment-id:756363205 --> @rtwright68 commented on GitHub (Jan 7, 2021): We have seen both. Its a VMware VM so I will boost up the CPU count. The CPUs on the ESXi hosts are: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
Author
Owner

@rtwright68 commented on GitHub (Jan 11, 2021):

TRMM worked great over the weekend, it appears to be suffering from the memory leak issue still.

<!-- gh-comment-id:758147710 --> @rtwright68 commented on GitHub (Jan 11, 2021): TRMM worked great over the weekend, it appears to be suffering from the memory leak issue still.
Author
Owner

@wh1te909 commented on GitHub (Jan 17, 2021):

please upgrade to latest version, check 0.3.0 release notes for migration guide then let me know if still issues

<!-- gh-comment-id:761741503 --> @wh1te909 commented on GitHub (Jan 17, 2021): please upgrade to latest version, check 0.3.0 release notes for migration guide then let me know if still issues
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/tacticalrmm#2088
No description provided.