user mode systemd unit, with co-ordinated cgroups.#1463
Conversation
Add option setting cgroupOverride, example usage: cgroupOverride /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/metpx-sr3.service this means that when sr3 starts up a process, either by analyst intervention or sr3 sanity, it will add the processes launched to this cgroup. systemD uses this group, so when starting up entire thing with systemD it will see what is already running. so systemD view and sr3 view of which processes are running should be consistent now. otoh... if systemd things the unit is stopped, it does not look at the cgroup, so it will not see things running if they are started manually. once it is started though, it does see the ones manually started before. ~ ~
Test Results243 tests 242 ✅ 1m 40s ⏱️ Results for commit 5f3f32f. |
|
just refreshing the discussion of this PR. 3.01 has gone out, so the concrete problems of the original issue are substantially addressed... what is this PR proposing? right now, metpx-sr3 is started as a system unit by systemd, that means it uses special permissions to use a reserved cgroup that isn't available to the sr3 sanity being run as a cron job, or an analyst restarting things manually. So when you do "systemctl status metpx-sr3" it lists only the processes started by Inconsistencies that can result: if all processes die, but are restarted by sanity then sr3 status will show things as running, but systemctl status metpx-sr3 will show it as down. The more starts and stops done by an analyst in the sarra user, the more different the systemd and sr3 views of what's running get. With this PR, you don't use the system daemon startup of metpx-sr3. Instead, you use user-mode systemD... That is the sarra user runs user mode units in systemd. If you do it that way, and we set a container that is accessible to the sarra user, then all interventions can join the same cgroup, and the systemd (user) and metpx-sr3 views of what is running can be the same. When in the sarra user, systemctl status metpx-sr3 will report all the same processes as sr3 status./ Is this a worthwhile/good change? Dunno. People have to kick the tires. Does anyone want to kick the tires? If not, we can just close the PR. I don't think it should sit forever, but I don't know when is a good time to close it. |
|
I do want to experiment with this but haven't had time to try it out |
close #1461
This is option 1 discussed in #1461. I don't think it should be merged... just want it to sit here for people to try out.
and think about. I want this PR to be in draft state an not get merged... for a month or two at least.
Add option setting cgroupOverride, example usage:
cgroupOverride /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/metpx-sr3.service
this means that when sr3 starts up a process, either by analyst intervention or sr3 sanity, it will add the processes launched to this cgroup. systemD uses this group, so when starting up entire thing with systemD it will see what is already running. so systemD view and sr3 view of which processes are running should be consistent now.
otoh... if systemd things the unit is stopped, it does not look at the cgroup, so it will not see things running if they are started manually. once it is started though, it does see the ones manually started before.
~
~