mirror of
https://github.com/amidaware/tacticalrmm.git
synced 2026-04-26 15:05:57 +03:00
[GH-ISSUE #262] services crash when installed along-side other software #170
Labels
No labels
In Process
bug
bug
dev-triage
documentation
duplicate
enhancement
fixed
good first issue
help wanted
integration
invalid
pull-request
question
requires agent update
security
ui tweak
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/tacticalrmm#170
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @bbrendon on GitHub (Jan 27, 2021).
Original GitHub issue: https://github.com/amidaware/tacticalrmm/issues/262
We've been having a problem for the last few weeks where services start crashing randomly on servers. The reason this got so much attention internally is because dhcpserver kept crashing. Some of the services that crash are:
nxlog, dhcpserver, Wecsvc, Schedule, lmhosts, eventlog
The most popular service to crash is nxlog
EDIT: I should add that by "crashing", I mean it didn't recover. It seems that after digging further many services crash but only a few don't recover.
This has been seen on OS: 2008r2, 2016std, win10pro, sbs2011
All of these servers also have another program that collects event logs which doesn't crash.
Timeline from today's crashes
1/27/2021 - 09:30 - updated Tactical from 0.4.0 to 0.4.1 (agent 1.3 to 1.4)
1/27/2021 - 09:47 - Monitoring triggered 4x servers, all different customers, all at about exactly the same time.
Since this seems related to nxlog + Tactical , my next step is going to see which version of nxlog these machines have and see if I can predict it in the future based on nxlog being installed and versions.
@bbrendon commented on GitHub (Jan 27, 2021):
After looking through the System event logs on h-dc2 a bunch of services are stopped immediately after the tactical agent stops which I'm presuming is because of the 1.3.0 to 1.4.0 agent update. Many are "stopped" and many are terminated unexpectedly. By "many", I mean about 20+ services. Somehow the server didn't totally melt-down.
After all the stops (error and informational events), tactical starts and other services begin starting again. But it seems not all services recover fully.
@bbrendon commented on GitHub (Jan 27, 2021):
I inspected the System Event log on h-dc1 (does not have nxlog). It looks much better. no errors. There are a few services that start and stop around the time of the tactical update but they seem be doing it by design.
Services restarted : windows module installer service, appxsvc, diagnostic host service
No critical services were restarted.
@rtwright68 commented on GitHub (Jan 27, 2021):
Funny, we have been seeing the same exact thing. The same services you mentioned above are what we are observing. I haven't updated yet to 0.41 since I have some agents that haven't updated from 1.12.
@bbrendon commented on GitHub (Jan 27, 2021):
@rtwright68 Are you running nxlog as well? This has been going on for a few weeks. Its not specific to agent 1.4.0
@bbrendon commented on GitHub (Jan 27, 2021):
I just saw someone post this in the chat. This looks exactly like it! Well, not exactly, but crazy similar.
@tektrak commented on GitHub (Jan 29, 2021):
I've seen the same thing. Didn't really notice the details until last night when updating Tactical RMM Agents since the updates run in the background. However, the log file reveals this issue. One server, a Windows Remote Desktop Host, showed some files in use by a number of services. The updater tried to kill the service processes, but had problems, aborted the install, and rolled back. However, a number of services did suffer in the process. Attached is the contents of the tacticalrmm.txt file from C:\Windows\temp\tacticalrmmxxxxxxxxx (although the line endings of this file may have changed in the transfers between systems).
tacticalrmm.txt
It seems like it would be preferable to delay completion of the upgrade until after a reboot rather than killing off things that might be using the files.
By the way, we are not running nxlog.
@wh1te909 commented on GitHub (Jan 29, 2021):
ive changed the inno setup executable in agent 1.4.1 to restart applications that were closed during an update
as a test can you guys please try the following:
un-check auto agent update in Settings > Global Settings
update rmm to 0.4.2 if not done so already via update.sh script
download agent 1.4.1 from https://github.com/wh1te909/rmmagent/releases/download/v1.4.1/winagent-v1.4.1.exe
put it somewhere on your agents filesystem then open cmd as admin, cd to the directory of the exe and call it like this please
winagent-v1.4.1.exe /VERYSILENT /SUPPRESSMSGBOXES /LOG=test123.txtmake note of the services that were being stopped but not restarting and see if now they restart and also paste the test123.txt here for me to see. thx.
@tektrak commented on GitHub (Jan 30, 2021):
On the client server running Windows Server 2012 R2 Standard that had problems last time, I ran the attempted upgrade from 1.4.0 to 1.4.1 manually as described above. The ugprade failed and afterwards some services were not restarted. Here is the status of the services after this attempt for the services indicated in the log file attached:
After noting the status, I manually restarted the stopped services without issue.
test123.txt
@wh1te909 commented on GitHub (Jan 30, 2021):
@tektrak thanks can u try now same exe call it like this plz
winagent-v1.4.1.exe /VERYSILENT /FORCECLOSEAPPLICATIONS /LOG=testforceclose.txt@tektrak commented on GitHub (Jan 30, 2021):
The agent seems to have updated now. Attached is the log file.
testforceclose.txt
@wh1te909 commented on GitHub (Jan 30, 2021):
thanks. seems this time there were no applications in use so can you keep trying until u can get the log to show
RestartManager found an application using one of our files ...and then i want to see if it will restart them. the original log file says it was aborted, which is the default when using the /SUPPRESSMSGBOXES flag so was hoping that removing that flag would make it retryAlthough i still dont understand why its saying it found applications in use. the inno setup exe stops and kill all tacticalrmm.exe processes before it starts to do the update so no sure whats going on. tactical does not interact with any of those services especially nxlog, never even heard of that. and am not able to reproduce on any of my agents
@tektrak commented on GitHub (Jan 30, 2021):
I'll work through other updates and note any issues.
I just tried a Windows 10 Pro workstation that seemed to be stuck on v1.1.11.
So I tried running
It failed. Attached is the log.
log-v1.1.2.txt
@wh1te909 commented on GitHub (Jan 30, 2021):
u need to use 1.1.12, not 1.1.2
@tektrak commented on GitHub (Jan 30, 2021):
Oops. I'll retry that one.
But here is the output from a Windows Server 2016 Standard that is a Windows Remote Desktop Host being upgraded from agent v1.4.0 to v1.4.1
.\winagent-v1.4.1.exe /VERYSILENT /SUPPRESSMSGBOXES /LOG=log-v1.4.1.txtHere are the services that got stopped:
log-v1.4.1.txt
@wh1te909 commented on GitHub (Jan 30, 2021):
replace
/SUPPRESSMSGBOXESwith/FORCECLOSEAPPLICATIONSwhen u call the exesuppress will by default abort. i want to see if without that flag if will restart them
@bbrendon commented on GitHub (Jan 30, 2021):
Ran....
winagent-v1.4.1.exe /VERYSILENT /FORCECLOSEAPPLICATIONS /LOG=log-v1.4.1.txtAfter a some seconds, this dialogue box appeared.
...I tried selecting "try again", but that didn't do anything so I selected ignore and continue.
After all was said and done, the list of automatic services that were running before and after the agent upgrade did not change. The list is below. It appears though that some of these should be running and were not. I think this was because the server was never rebooted after the last agent upgrade debacle.
Log below.
@tektrak commented on GitHub (Jan 30, 2021):
Here is a Windows Server 2012 R2 Standard that was upgraded successfully from v1.4.0 to v1.4.1.
.\winagent-v1.4.1.exe /VERYSILENT /FORCECLOSEAPPLICATIONS /LOG=log-v1.4.1.txt
All of the services mentioned:
were running after the upgrade.
log-v1.4.1.txt
@rtwright68 commented on GitHub (Jan 30, 2021):
log-v141-1.txt
v141-2.txt
Ran on a couple Windows 2019 VMs that were stuck on 1.1.12. Still showing the old version number in the agent dashboard.
@rtwright68 commented on GitHub (Jan 30, 2021):
One other piece of info. Both of the agents I attempted the 1.4.1 update on are currently yellow. All services are up and running at this point, attempted a reboot on one of the agents.
@tektrak commented on GitHub (Jan 30, 2021):
@rtwright68 I had some older agents on v1.1.11 and v1.1.12. The v1.1.11 agent I first updated to v1.1.12. Then updated the v1.1.12 agents to v1.2.0, then v1.3.0, then v1.4.1. I may not have needed to do all these intermediate steps, but I believe at least that you shouldn't skip v1.3.0 before going to v1.4.0 or v1.4.1.
@wh1te909 commented on GitHub (Jan 30, 2021):
@tektrak yes that's correct, always need to update incremental based on the minor version number so you did good
@rtwright68 you need to uninstall those agents they are broken, straight upgrade from 1.1.12 to 1.4.1 will break the agents
@wh1te909 commented on GitHub (Jan 30, 2021):
are you guys able to this try with this exe please?
winagent-v1.3.555.zip
please run it like this and then upload the txt file
then wait like 15 seconds and check if the exe was replaced by running
and see if it shows version 1.3.555
ive changed this exe to not close and not restart any services, since my theory is that those services are not in use anyway by tactical so no point in trying to close them
@tektrak commented on GitHub (Jan 31, 2021):
Here's the first upgrade I tried on the first Windows Server 2012 R2 Standard server I mentioned above. It upgraded fine to the new version and connected to the TRMM server. The log is attached.
I'll also try it on a few more previously troublesome systems.
13555.txt
@tektrak commented on GitHub (Jan 31, 2021):
And here's another run on the Windows Server 2016 Standard server mentioned above. It also upgraded fine to the new version and connected to the TRMM server. The log is attached.
13555.txt
No services were harmed in the making of these upgrades. Thanks!
@tektrak commented on GitHub (Jan 31, 2021):
Just ran the v1.3.555 agent upgrade test on 3 more servers including a Microsoft Windows Server Core 2016 without incident.
@wh1te909 commented on GitHub (Jan 31, 2021):
@tektrak that's amazing wow! really hope this is the fix lol
@bbrendon can u try plz? if it works for you i'll get this released asap
@bbrendon commented on GitHub (Jan 31, 2021):
Seems like it worked. Log https://pastebin.com/QpNs3m34
It just had an issue with nssm.exe
No service issues that I could see.
@wh1te909 commented on GitHub (Feb 1, 2021):
@bbrendon thanks. I'll be getting rid of nssm eventually since it's no longer actively maintained, for now ive just changed the updater to not attempt to replace that file.
I'll be releasing an update to rmm shortly and with it agent v1.4.2
Since I changed the function inside the agent that handles agent update, when your agents update to 1.4.2 it will probably still attempt to force close those services since it will still be using the code from 1.4.1, so it won't be until the next agent after 1.4.2 until the issue is fully resolved so you might still need to manually update agents until they are all on 1.4.2
@tektrak commented on GitHub (Feb 2, 2021):
I have installed the rmm server update and more than half the agents (the half that are online now, including the servers) are now on the new v1.4.2. They seem to have updated without incident. I updated two servers via command line so that I could watch the process in more detail. The rest were manually updated via the web interface, as I currently have agent auto update disabled. Thanks for your efforts improving the update process!
@bbrendon commented on GitHub (Feb 2, 2021):
Same here. No sirens went off. Looking good.