[GH-ISSUE #1] Linux Agent goes offline #79

Closed
opened 2026-03-14 01:31:13 +03:00 by kerem · 20 comments
Owner

Originally created by @ryszard-suchocki on GitHub (Mar 21, 2022).
Original GitHub issue: https://github.com/amidaware/rmmagent/issues/1

Originally assigned to: @wh1te909 on GitHub.

Hi,
I'm testing the community beta Linux Agent for TRMM. I want to report that after a while Linux agent goes offline (status changed to offline), although the checks work fine. Also, it is possible to invoke remote commands, etc. so there is communication between agent and server. Could you verify on your side?

• Ubuntu 20.04 x86_64 5.4.0-104-generic • Agent v2.0.0

Temporary I'm running agent by invoking ./rmmagent -m svc

Best regards

Originally created by @ryszard-suchocki on GitHub (Mar 21, 2022). Original GitHub issue: https://github.com/amidaware/rmmagent/issues/1 Originally assigned to: @wh1te909 on GitHub. Hi, I'm testing the community beta Linux Agent for TRMM. I want to report that after a while Linux agent goes offline (status changed to offline), although the checks work fine. Also, it is possible to invoke remote commands, etc. so there is communication between agent and server. Could you verify on your side? • Ubuntu 20.04 x86_64 5.4.0-104-generic • Agent v2.0.0 Temporary I'm running agent by invoking `./rmmagent -m svc` Best regards
kerem 2026-03-14 01:31:13 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@dinger1986 commented on GitHub (Mar 21, 2022):

Where are your agents hosted?

<!-- gh-comment-id:1073873367 --> @dinger1986 commented on GitHub (Mar 21, 2022): Where are your agents hosted?
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 21, 2022):

Could you clarify in more simple words? The whole setup works in a simple environment, in LAN. Linux Agent works on a physical machine with "direct" access to TRMM. Other agents (Win) communicate fine (local and remote).

<!-- gh-comment-id:1073879550 --> @ryszard-suchocki commented on GitHub (Mar 21, 2022): Could you clarify in more simple words? The whole setup works in a simple environment, in LAN. Linux Agent works on a physical machine with "direct" access to TRMM. Other agents (Win) communicate fine (local and remote).
Author
Owner

@dinger1986 commented on GitHub (Mar 21, 2022):

ok, I am having issues with amazon agents but fine for all others

<!-- gh-comment-id:1073881252 --> @dinger1986 commented on GitHub (Mar 21, 2022): ok, I am having issues with amazon agents but fine for all others
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 21, 2022):

Could you elaborate on how you register agents? My approach was:

  1. Click Agents
  2. Install Agent -> Windows; I choose Client, Site
  3. Install Method -> Manual - copy the data required to register a new agent.
  4. On linux box -> ./rmmagent -m install --api https://trmm.tld --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX
  5. ./rmmagent -m svc
<!-- gh-comment-id:1073899044 --> @ryszard-suchocki commented on GitHub (Mar 21, 2022): Could you elaborate on how you register agents? My approach was: 1. Click Agents 2. Install Agent -> Windows; I choose Client, Site 3. Install Method -> Manual - copy the data required to register a new agent. 4. On linux box -> ./rmmagent -m install --api https://trmm.tld --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX 5. ./rmmagent -m svc
Author
Owner

@wh1te909 commented on GitHub (Mar 21, 2022):

https://github.com/amidaware/tacticalrmm/blob/develop/api/tacticalrmm/core/agent_linux.sh this should help

<!-- gh-comment-id:1074133795 --> @wh1te909 commented on GitHub (Mar 21, 2022): https://github.com/amidaware/tacticalrmm/blob/develop/api/tacticalrmm/core/agent_linux.sh this should help
Author
Owner

@wh1te909 commented on GitHub (Mar 21, 2022):

you need to keep it running via systemd or something similar on your distro

<!-- gh-comment-id:1074134531 --> @wh1te909 commented on GitHub (Mar 21, 2022): you need to keep it running via systemd or something similar on your distro
Author
Owner

@georgebarnick commented on GitHub (Mar 21, 2022):

Installed using the above script with code-signed agents. Workig fine on a Ubuntu 20.04 test VM I made on my local VMware Workstation with no issues. Then deployed it on some AWS and Azure VMs I have (a mix of Ubuntu 20.04 and CentOS 7), and having the issue described in OP where they're going offline after a few minutes after running their first checks. The agents are running in systemd as suggested, and systemctl restart tacticalagent.service will bring them back to "online" status in the dashboard, but they slowly go back to offline again. Curious what to try next.

Edit: Further information about some examples of agents below

Agent that's working fine: Ubuntu 20.04 x86_64 5.4.0-105-generic • Agent v2.0.0
AWS Ubuntu agent that's going offline: Ubuntu 20.04 x86_64 5.13.0-1017-aws • Agent v2.0.0
Azure Ubuntu agent that's going offline: Ubuntu 20.04 x86_64 5.13.0-1017-azure • Agent v2.0.0
Azure CentOS agent that's going offline: Centos 7.9.2009 x86_64 3.10.0-1160.53.1.el7.x86_64 • Agent v2.0.0

Happy to provide any other troubleshooting information as-needed.

<!-- gh-comment-id:1074310111 --> @georgebarnick commented on GitHub (Mar 21, 2022): Installed using the above script with code-signed agents. Workig fine on a Ubuntu 20.04 test VM I made on my local VMware Workstation with no issues. Then deployed it on some AWS and Azure VMs I have (a mix of Ubuntu 20.04 and CentOS 7), and having the issue described in OP where they're going offline after a few minutes after running their first checks. The agents are running in systemd as suggested, and `systemctl restart tacticalagent.service` will bring them back to "online" status in the dashboard, but they slowly go back to offline again. Curious what to try next. Edit: Further information about some examples of agents below Agent that's working fine: Ubuntu 20.04 x86_64 5.4.0-105-generic • Agent v2.0.0 AWS Ubuntu agent that's going offline: Ubuntu 20.04 x86_64 5.13.0-1017-aws • Agent v2.0.0 Azure Ubuntu agent that's going offline: Ubuntu 20.04 x86_64 5.13.0-1017-azure • Agent v2.0.0 Azure CentOS agent that's going offline: Centos 7.9.2009 x86_64 3.10.0-1160.53.1.el7.x86_64 • Agent v2.0.0 Happy to provide any other troubleshooting information as-needed.
Author
Owner

@wh1te909 commented on GitHub (Mar 21, 2022):

@georgebarnick please enable debug logging so we can see where it's getting stuck
modify /etc/systemd/system/tacticalagent.service
and change

ExecStart=/usr/local/bin/tacticalagent -m svc

to

ExecStart=/usr/local/bin/tacticalagent -m svc -log debug

(add the -log debug)
then systemctl daemon-reload && systemctl restart tacticalagent
wait for agent to go offline then lets see what's in /var/log/tacticalagent.log

<!-- gh-comment-id:1074328451 --> @wh1te909 commented on GitHub (Mar 21, 2022): @georgebarnick please enable debug logging so we can see where it's getting stuck modify `/etc/systemd/system/tacticalagent.service` and change ``` ExecStart=/usr/local/bin/tacticalagent -m svc ``` to ``` ExecStart=/usr/local/bin/tacticalagent -m svc -log debug ``` (add the `-log debug`) then `systemctl daemon-reload && systemctl restart tacticalagent` wait for agent to go offline then lets see what's in `/var/log/tacticalagent.log`
Author
Owner

@georgebarnick commented on GitHub (Mar 21, 2022):

@wh1te909 So far the only things in the log after the agent service restarts and goes through its checks and everything the first time is:

time="2022-03-21T20:02:23Z" level=debug msg="Checkrunner sleeping for 120"

every few minutes
and

time="2022-03-21T20:02:24Z" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:0 Complete:false Exit:-1 Error:fork/exec /opt/tacticalmesh/meshagent: no such file or directory StartTs:1647892944163829150 StopTs:1647892944164151273 Runtime:0 Stdout:[] Stderr:[]} Stdout: Stderr:}\n"

every second.

I installed with the --nomesh flag on most if not all of these VMs that are going offline. Not sure if that's going to be related to the agent going offline or a separate issue, but maybe @ryszard-suchocki can chime in if he has the Mesh Agent with his affected install or not. The reason I did --nomesh was that the install seemed to get stuck on the "Getting mesh node id" step on one of them, so I just decided to omit it from all of them. I could try to reinstall with the mesh agent if you need and have an idea on why it might have gotten stuck there. I'm no expert with MeshCentral yet so haven't troubleshot that myself.

<!-- gh-comment-id:1074367730 --> @georgebarnick commented on GitHub (Mar 21, 2022): @wh1te909 So far the only things in the log after the agent service restarts and goes through its checks and everything the first time is: ``` time="2022-03-21T20:02:23Z" level=debug msg="Checkrunner sleeping for 120" ``` every few minutes and ``` time="2022-03-21T20:02:24Z" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:0 Complete:false Exit:-1 Error:fork/exec /opt/tacticalmesh/meshagent: no such file or directory StartTs:1647892944163829150 StopTs:1647892944164151273 Runtime:0 Stdout:[] Stderr:[]} Stdout: Stderr:}\n" ``` every second. I installed with the `--nomesh` flag on most if not all of these VMs that are going offline. Not sure if that's going to be related to the agent going offline or a separate issue, but maybe @ryszard-suchocki can chime in if he has the Mesh Agent with his affected install or not. The reason I did `--nomesh` was that the install seemed to get stuck on the "Getting mesh node id" step on one of them, so I just decided to omit it from all of them. I could try to reinstall with the mesh agent if you need and have an idea on why it might have gotten stuck there. I'm no expert with MeshCentral yet so haven't troubleshot that myself.
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 21, 2022):

In my case, Mesh Agent has been installed before, separately to TRMM. I did not use -nomesh parameter when "installing" TRMM. So I decided to remove my agent and "install" it by passing -nomesh and -log debug parameters. Although -nomesh parameter the log file got filled by:

896886173760991 Runtime:0 Stdout:[] Stderr:[]} Stdout: Stderr:}\n"
time="2022-03-21T22:08:07+01:00" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:0 Complete:false Exit:-1 Error:fork/exec /opt/tacticalmesh/meshagent: no such file or directory StartTs:1647896887174316611 

so I decided to manually copy the meshagent executable to specified folder (which had not exist, need to be created manually). Now log look like below and agent status is correct, the last response time is updated correctly

time="2022-03-21T22:08:08+01:00" level=debug
time="2022-03-21T22:08:08+01:00" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:249577 Complete:true Exit:0 Error:<nil> StartTs:1647896888175850463 StopTs:1647896888267528965 Runtime:0.091678527 Stdout:[] Stderr:[]} Stdout:\n Stderr:}\n"
time="2022-03-21T22:08:10+01:00" level=debug msg="Checking for windows updates"
time="2022-03-21T22:08:32+01:00" level=debug msg="Checkrunner sleeping for 120"
time="2022-03-21T22:09:06+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}"
time="2022-03-21T22:10:02+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}"
time="2022-03-21T22:10:32+01:00" level=debug msg="Checkrunner sleeping for 120"
time="2022-03-21T22:10:58+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}"
<!-- gh-comment-id:1074428691 --> @ryszard-suchocki commented on GitHub (Mar 21, 2022): In my case, Mesh Agent has been installed before, separately to TRMM. I did not use `-nomesh` parameter when "installing" TRMM. So I decided to remove my agent and "install" it by passing `-nomesh` and `-log debug` parameters. Although `-nomesh` parameter the log file got filled by: ``` 896886173760991 Runtime:0 Stdout:[] Stderr:[]} Stdout: Stderr:}\n" time="2022-03-21T22:08:07+01:00" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:0 Complete:false Exit:-1 Error:fork/exec /opt/tacticalmesh/meshagent: no such file or directory StartTs:1647896887174316611 ``` so I decided to manually copy the meshagent executable to specified folder (which had not exist, need to be created manually). Now log look like below and agent status is correct, the last response time is updated correctly ``` time="2022-03-21T22:08:08+01:00" level=debug time="2022-03-21T22:08:08+01:00" level=debug msg="{Status:{Cmd:/opt/tacticalmesh/meshagent PID:249577 Complete:true Exit:0 Error:<nil> StartTs:1647896888175850463 StopTs:1647896888267528965 Runtime:0.091678527 Stdout:[] Stderr:[]} Stdout:\n Stderr:}\n" time="2022-03-21T22:08:10+01:00" level=debug msg="Checking for windows updates" time="2022-03-21T22:08:32+01:00" level=debug msg="Checkrunner sleeping for 120" time="2022-03-21T22:09:06+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}" time="2022-03-21T22:10:02+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}" time="2022-03-21T22:10:32+01:00" level=debug msg="Checkrunner sleeping for 120" time="2022-03-21T22:10:58+01:00" level=debug msg="agent-hello {jX****************************cyixu 2.0.0}" ```
Author
Owner

@wh1te909 commented on GitHub (Mar 22, 2022):

thanks I will do some testing without mesh. The agent should still check in without mesh so that is probably a bug

<!-- gh-comment-id:1074572426 --> @wh1te909 commented on GitHub (Mar 22, 2022): thanks I will do some testing without mesh. The agent should still check in without mesh so that is probably a bug
Author
Owner

@wh1te909 commented on GitHub (Mar 22, 2022):

so from my initial testing with --nomesh (been about 12 hours now on a few vms) I get that error in the logs about not finding the executable which obviously is expected but the agent continues to check in and doesn't freeze which also is expected so im still not sure why your agents are going offline. I have not tested on AWS or Azure though I will do that today

<!-- gh-comment-id:1075387154 --> @wh1te909 commented on GitHub (Mar 22, 2022): so from my initial testing with `--nomesh` (been about 12 hours now on a few vms) I get that error in the logs about not finding the executable which obviously is expected but the agent continues to check in and doesn't freeze which also is expected so im still not sure why your agents are going offline. I have not tested on AWS or Azure though I will do that today
Author
Owner

@dinger1986 commented on GitHub (Mar 22, 2022):

I have found some that were dying after installing mesh they stay online but some arent staying online long enough to install mesh, or get stuck on Getting Mesh node ID....., it doesnt seem to be just AWS, it seems to be random machines, across centos and ubuntu

<!-- gh-comment-id:1075389688 --> @dinger1986 commented on GitHub (Mar 22, 2022): I have found some that were dying after installing mesh they stay online but some arent staying online long enough to install mesh, or get stuck on Getting Mesh node ID....., it doesnt seem to be just AWS, it seems to be random machines, across centos and ubuntu
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 22, 2022):

A few moments ago I have removed the "mesh agent" executable from "/opt/tacticalmesh" and the issue occurred again. Would someone like to try my "installation" steps? I can share my builds and generated config to analyze them. What is worth noting, in my case "mesh agent" still works in the background, as it was installed separately.

Edit: I have tried to run RMM Agent on my NAS (Asustor). The same behavior. Without "mesh agent" status changed to offline; when executable placed in "/opt/tacticalmesh" everything works fine.

<!-- gh-comment-id:1075473164 --> @ryszard-suchocki commented on GitHub (Mar 22, 2022): A few moments ago I have removed the "mesh agent" executable from "/opt/tacticalmesh" and the issue occurred again. Would someone like to try my "installation" steps? I can share my builds and generated config to analyze them. What is worth noting, in my case "mesh agent" still works in the background, as it was installed separately. Edit: I have tried to run RMM Agent on my NAS (Asustor). The same behavior. Without "mesh agent" status changed to offline; when executable placed in "/opt/tacticalmesh" everything works fine.
Author
Owner

@wh1te909 commented on GitHub (Mar 25, 2022):

@ryszard-suchocki yes please share your installation steps

I am still unable to reproduce, I have been testing for a few days now, with mesh, without mesh. On azure, AWS, hetzner etc. Not able to reproduce at all

<!-- gh-comment-id:1079164533 --> @wh1te909 commented on GitHub (Mar 25, 2022): @ryszard-suchocki yes please share your installation steps I am still unable to reproduce, I have been testing for a few days now, with mesh, without mesh. On azure, AWS, hetzner etc. Not able to reproduce at all
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 26, 2022):

My env: Proxmox 6.X, agent build in Ubuntu 20.04 container (ubuntu-20.04-standard_20.04-1_amd64.tar.gz; Rel. 2021-04-05 13:09:49):

  1. Deploy container
  2. apt update && apt upgrade
  3. wget https://go.dev/dl/go1.17.8.linux-amd64.tar.gz && tar -C /usr/local/ -xzf go1.17.8.linux-amd64.tar.gz
  4. nano /etc/environment && add /usr/local/go/bin to PATH
  5. wget https://github.com/amidaware/rmmagent/archive/refs/tags/v2.0.0.zip && apt install unzip
  6. cd rmmagent2.0
  7. env CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-s -w"
  8. scp rmmagent executable to destination host

Install:

  1. TRMM UI -> click Agents
  2. Install Agent -> Windows; I choose Client, Site
  3. Install Method -> Manual - copy the data required to register a new agent (-m install --api https://trmm.tld/ --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX)
  4. On Linux box -> ./rmmagent -m install --api https://trmm.tld/ --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX **-nomesh**
  5. ./rmmagent -m svc -l debug
<!-- gh-comment-id:1079641655 --> @ryszard-suchocki commented on GitHub (Mar 26, 2022): My env: Proxmox 6.X, agent build in Ubuntu 20.04 container (ubuntu-20.04-standard_20.04-1_amd64.tar.gz; Rel. 2021-04-05 13:09:49): 1. Deploy container 2. apt update && apt upgrade 3. wget `https://go.dev/dl/go1.17.8.linux-amd64.tar.gz && tar -C /usr/local/ -xzf go1.17.8.linux-amd64.tar.gz` 4. nano /etc/environment && add /usr/local/go/bin to PATH 5. `wget https://github.com/amidaware/rmmagent/archive/refs/tags/v2.0.0.zip && apt install unzip` 6. cd rmmagent2.0 7. `env CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-s -w"` 8. scp `rmmagent` executable to destination host Install: 1. TRMM UI -> click Agents 2. Install Agent -> Windows; I choose Client, Site 3. Install Method -> Manual - copy the data required to register a new agent (`-m install --api https://trmm.tld/ --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX`) 4. On Linux box -> `./rmmagent -m install --api https://trmm.tld/ --client-id X --site-id X --agent-type server --auth a2c4e...XXXXXXXX **-nomesh**` 5. ./rmmagent -m svc -l debug
Author
Owner

@wh1te909 commented on GitHub (Mar 26, 2022):

@ryszard-suchocki please use the installation script that I linked to in a previous comment and see how that installs it and uses systemd to keep it running

<!-- gh-comment-id:1079785816 --> @wh1te909 commented on GitHub (Mar 26, 2022): @ryszard-suchocki please use the installation script that I linked to in a previous comment and see how that installs it and uses systemd to keep it running
Author
Owner

@dinger1986 commented on GitHub (Mar 26, 2022):

also can you try send command and send df -h and see if it works?

Mine goes offline but can still send commands

<!-- gh-comment-id:1079795071 --> @dinger1986 commented on GitHub (Mar 26, 2022): also can you try send command and send df -h and see if it works? Mine goes offline but can still send commands
Author
Owner

@wh1te909 commented on GitHub (Mar 26, 2022):

ok all nevermind I found the bug, I forgot to spawn the function that attempts to sync the meshnodeid into it's own goroutine so it basically hangs forever when mesh is not installed LOL. will push a fix shortly

<!-- gh-comment-id:1079796074 --> @wh1te909 commented on GitHub (Mar 26, 2022): ok all nevermind I found the bug, I forgot to spawn the function that attempts to sync the meshnodeid into it's own goroutine so it basically hangs forever when mesh is not installed LOL. will push a fix shortly
Author
Owner

@ryszard-suchocki commented on GitHub (Mar 27, 2022):

Fixed!

<!-- gh-comment-id:1079946497 --> @ryszard-suchocki commented on GitHub (Mar 27, 2022): Fixed!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/rmmagent#79
No description provided.