mirror of
https://github.com/amidaware/tacticalrmm.git
synced 2026-04-26 15:05:57 +03:00
[GH-ISSUE #1250] Agent fails to update because previous update is still running #2717
Labels
No labels
In Process
bug
bug
dev-triage
documentation
duplicate
enhancement
fixed
good first issue
help wanted
integration
invalid
pull-request
question
requires agent update
security
ui tweak
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/tacticalrmm#2717
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @NiceGuyIT on GitHub (Aug 21, 2022).
Original GitHub issue: https://github.com/amidaware/tacticalrmm/issues/1250
Server Info (please complete the following information):
Installation Method:
Agent Info (please complete the following information):
Describe the bug
One agent is failing to update to v2.2.1. After logging into the computer, I discovered an existing update is pending. The inno logs show the upgrade failed to delete a file, retried, and ultimately prompted with a "Message box (Abort/Retry/Ignore)". The agent upgrade code starts the command but never checks if the command completed. Yes, I know it can't check if the current command spawned finished because part of the upgrade process is to restart the service. However, it can check if a previous upgrade is hung or still running. The
agent.logshows the agent keeps trying to update but fails because thetacticalagent-v2.2.1-windows-amd64.exeis in use by another process. That other process is the previous upgrade that is waiting for a prompt.To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect the agent to check if a previous upgrade is hung and handle the situation.
Screenshots

Here is a screenshot of the existing setup from Process Explorer.
Here is the properties of the upgrade process started on 2022-08-19 10:35:05 PM.

Additional context
The inno setup log is below.
C:\Program Files\TacticalAgent\agent.log:
C:\Windows\Temp\tacticalrmm3774362645\tacticalrmm.txt:
@NiceGuyIT commented on GitHub (Aug 21, 2022):
Restarting the service does not kill the hung install process. I had to kill the install process before the installation was tried again, and it hung with the same error.
All computers are restarted weekly at 10:00 PM local time. The hung install process was from 10:35:05 PM right after the restart. While restarting the computer and then forcing the update worked today, the automated update from Friday night did not. Also, I believe this agent was not updating before Friday. I just haven't had time to dive into why it wasn't updating.
At the very least, the rmmagent should detect hung upgrades and handle the situation to allow the next update to possibly work.
@NiceGuyIT commented on GitHub (Aug 21, 2022):
The
agent.logshows the upgrade failed ever since I upgraded the server on 2022-08-09.@wh1te909 commented on GitHub (Aug 21, 2022):
the agent already has a KillHungUpdates() function and calls it before it attempts to update. Since the agent binary naming format changed though I just recently added the pattern matching
tacticalagent-v*to that function so you probably are still on the agent that doesn't have that updated function yet so it won't work until you actually update to that agent so you might need to do it manually for this one and in future it will work.@NiceGuyIT commented on GitHub (Aug 21, 2022):
Looks like that was first available in v2.1.0. Closing this as it should no longer be an issue. Thanks!
@NiceGuyIT commented on GitHub (Aug 25, 2022):
While troubleshooting issue #1265, I tried upgrading from version v2.1.1 and it failed several times. If I waited until the installer started and then stopped the tacticalrmm service, the upgrade was successful. I suspect this is a race condition and killing the previous update makes it try again which eventually succeeds.
@NiceGuyIT commented on GitHub (Sep 26, 2022):
This is happening again after upgrading. Linux agents updated without a problem. All but 2 Windows Servers updated without a problem. 12 Windows workstations failed and 3 succeeded. Of the 3 that succeeded, 2 are VMs and 1 is a slow laptop. I believe this is a race condition. How do I troubleshoot?
Previous version:
Current version:
Based on the release dates, I skipped two tacticalagent versions.
@NiceGuyIT commented on GitHub (Sep 26, 2022):
Here's the setup log from the latest run.
C:\Windows\Temp\tacticalrmm366239534\tacticalrmm.txt@NiceGuyIT commented on GitHub (Sep 26, 2022):
I enabled agent debugging using the script which caused the agent to restart. The agent updated successfully. Does this require the agent to restart before working properly?
@NiceGuyIT commented on GitHub (Sep 26, 2022):
On another computer, I restarted the service and
tacticalagentfailed to start. Started it manually, forced the update and it still hung. Enabled debug logging using a script, forced the update and it worked. I don't know why debug logging makes it work.@NiceGuyIT commented on GitHub (Sep 26, 2022):
This is pretty consistent. On a 3rd computer, agent debug was enabled, pushed the update and it worked.
On a forth computer, restated the tacticalagent service using PowerShell, forced the update and it failed. Enabled debug logging on the agent, forced the update and it worked. I have no idea why.
@NiceGuyIT commented on GitHub (Sep 26, 2022):
Here are the exact steps. The remaining agents were able to update without problems using these procedures.
@silversword411 commented on GitHub (Sep 26, 2022):
It's using the wrong directories, those are pre....would have to check version
2022-09-26 12:35:05.178 Created temporary directory: C:\WINDOWS\TEMP\is-K7G4F.tmpWhat agent version those running?
Also https://github.com/amidaware/tacticalrmm/issues/1238
@NiceGuyIT commented on GitHub (Sep 26, 2022):
That's above, v2.2.1 trying to upgrade to v2.4.0.
Even still, turning on debug mode makes everything work. Restarting without debug mode enabled and it fails. That's what makes this puzzling.
@silversword411 commented on GitHub (Sep 26, 2022):
Yeah, that's weird...must be some interaction with the temp ACL permissions somehow?
@silversword411 commented on GitHub (Sep 26, 2022):
Good to hear those are working....was actually planning on coming back to them to check...maybe rebuild with registry edits instead of one-off cmd launches :)
For you, probably don't need to de-debug mode since it's restarting service on install
@NiceGuyIT commented on GitHub (Nov 1, 2022):
I updated last night to v0.15.2, agent v2.4.1 and it seems most or all the Windows 10 devices did not update. Windows server agents updated. Since enabling debug mode makes the update work, what can I do to gather logs without enabling debug mode?
@silversword411 commented on GitHub (Nov 4, 2022):
Have you tried:
@NiceGuyIT commented on GitHub (Nov 5, 2022):
Enabling debugging via script command runs the agent as a process, not a service. That's why enabling debugging allows the agent to update.
To troubleshoot this, the log level was changed for the service itself. First, get the current command line.
Set log level to trace.
Verify the new command.
And then restart the service.
With debug logging enabled, an update was pushed. The complete logs are below. The Event Viewer log shows the service was restarted because it terminated unexpectedly.
Here's the timeline of events. The code has a 1 second delay between "Agent updating" and
os.Exit(0)inrpc.go. Without millisecond logging to confirm, it seems theos.Exit(0)is causing Windows to think the service terminated unexpectedly.tacticalagent_update_v2.4.1.txt
agent.log
@NiceGuyIT commented on GitHub (Nov 5, 2022):
This fixes the problem for me. It uses Tactical's
ControlServicefunction to stop itself and doesn't trigger the Windows event log.As for how to stop a Windows service properly, I found this issue in Kubernetes.
I also found this SO question to Gracefully terminate a process on Windows where someone mentioned:
For context, I use Bitdefender GravityZone. This could introduce additional latency that is causing this race condition.
@NiceGuyIT commented on GitHub (Nov 6, 2022):
Here's the timeline for Windows Server which does not have Bitdefender and updated successfully. The event logs have the same error but the install was a little faster. About 4 seconds for both systems to run the install, but the one above is a little slower causing the service to restart just before the installer tries to replace the file.
@wh1te909 commented on GitHub (Nov 7, 2022):
thank you, can confirm adding the control service line fixes the issue, will be in next release
@silversword411 commented on GitHub (Nov 10, 2022):
Closing...in anticipation of release-and-fix.