Mono Threads
I've been tweaking what I monitor and it seems to me mono is misbehaving for me - though CMX might be contributing. I'm running mono 3.2.8 which seems to be the general recommendation.
The local dashboard seems to updating every 3 seconds when I watch it, and observations seem to be posting to my web server and Weather Underground (though the latter shows a flat spot in the temp graph for about 30 minutes, where there's normally a little "jitter" in the readings.) Since the Pi's only real task is Cumulus, it doesn't yet seem to be causing much of a functional problem. My goal though, it to try to
anticipate a problem and either restart CMX, or reboot the Pi to avoid getting to the point where performance degrades.
The load average has been creeping up, but it cycles (as illustrated in the graph above). A few hours ago I saw the one minute average drop as low as 0.41 and as high as 11.05. Looking at top, 2 things concern me:
- Some swap space is being used. In my setup that might be hitting the SD card, though if the green LED is blinking, I'm not seeing it. (Most of the IO is going to an SSD.) In my experience once this starts it seldom decreases.
- The mono process is showing almost 80 %CPU (which I believe means 80% of one core). After a reboot/restart it mostly reports 0%, with an occasional blip to 5-6%.
Code: Select all
top - 09:50:25 up 7 days, 13:18, 2 users, load average: 4.46, 3.73, 4.13
Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.3 us, 12.9 sy, 0.0 ni, 79.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 945512 total, 904156 used, 41356 free, 363488 buffers
KiB Swap: 102396 total, 232 used, 102164 free. 112260 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
540 root 20 0 720100 360148 18176 S 79.2 38.1 5630:58 mono
30544 pi 20 0 5108 2580 2164 R 0.7 0.3 0:00.57 top
29304 pi 20 0 12196 3716 3000 S 0.3 0.4 0:00.74 sshd
I'm monitoring the number of mono threads every 10 minutes. A snippet of that log below shows 333 mono threads running at this point. I'm not aware of anything beside CMX running under mono. After a restart I saw 14 or 15 threads for some days. Then it jumped to ~ 90, then ~ 180, then ~ 270 where it stayed for a few more days. (Interesting pattern, 1*90, 2*90, and 3*90 approx.) It's been around 333 for a day +/-.
Code: Select all
0 10/18/17 09:24:01 -0400
1 top - 09:24:01 up 7 days, 12:52, 2 users, load average: 6.75, 5.13, 4.74
2 Threads: 478 total, 4 running, 474 sleeping, 0 stopped, 0 zombie
3 %Cpu(s): 5.5 us, 10.5 sy, 0.0 ni, 83.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
4 KiB Mem: 945512 total, 902760 used, 42752 free, 362776 buffers
5 KiB Swap: 102396 total, 232 used, 102164 free. 112156 cached Mem
6
7 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8 540 root 20 0 720100 359792 18176 S 0.0 38.1 0:43.82 mono
9 632 root 20 0 720100 359792 18176 S 0.0 38.1 1:31.85 mono
10 737 root 20 0 720100 359792 18176 S 0.0 38.1 0:01.83 mono
11 823 root 20 0 720100 359792 18176 S 0.0 38.1 8:44.81 mono
12 824 root 20 0 720100 359792 18176 S 0.0 38.1 0:01.55 mono
13 825 root 20 0 720100 359792 18176 S 0.0 38.1 0:47.39 mono
14 826 root 20 0 720100 359792 18176 S 0.0 38.1 21:34.51 mono
15 827 root 20 0 720100 359792 18176 S 0.0 38.1 3:04.46 mono
. . . . . . .
335 24498 root 20 0 720100 359792 18176 S 0.0 38.1 1:28.23 mono
336 24499 root 20 0 720100 359792 18176 S 0.0 38.1 1:27.57 mono
337 24500 root 20 0 720100 359792 18176 S 0.0 38.1 1:28.44 mono
338 29422 root 20 0 720100 359792 18176 S 0.0 38.1 0:12.30 mono
339 30284 root 20 0 720100 359792 18176 S 0.0 38.1 0:01.49 mono
340 30292 root 20 0 720100 359792 18176 S 0.0 38.1 0:00.18 mono
341 22969 pi 20 0 4636 2656 2460 S 0.0 0.3 0:00.34 top_mono.sh
Cumulus has been up for about a week, and it looks like I rebooted the same day. If I recall, the processor loading clears with a CMX restart, but gets into this cycling within a few days (rather than a week) without a reboot. I may try that again to see...
Code: Select all
● cumulusmx.service - CumulusMX Service
Loaded: loaded (/lib/systemd/system/cumulusmx.service; enabled)
Active: active (running) since Tue 2017-10-10 20:31:50 EDT; 1 weeks 0 days ago
Main PID: 540 (mono)
CGroup: /system.slice/cumulusmx.service
└─540 /usr/bin/mono /mnt/root/home/pi/Cumulus_MX/CumulusMX.exe
Oct 10 20:31:55 raspberrypi_02 mono[540]: Fine Offset station found
Oct 10 20:31:55 raspberrypi_02 mono[540]: Connected to station
Oct 10 20:31:57 raspberrypi_02 mono[540]: Cumulus running at: http://*:8998/
Oct 10 20:31:57 raspberrypi_02 mono[540]: (Replace * with any IP address on this machine, or localhost)
Oct 10 20:31:57 raspberrypi_02 mono[540]: Starting web socket server on port 8002
Oct 10 20:31:57 raspberrypi_02 mono[540]: 10/10/2017 8:31:57 PM
Oct 10 20:31:57 raspberrypi_02 mono[540]: Type Ctrl-C to terminate
Oct 17 23:05:01 raspberrypi_02 mono[540]: Warning: Degraded allocation. Consider increasing nursery-size if th...ists.
Oct 17 23:10:01 raspberrypi_02 mono[540]: Warning: Degraded allocation. Consider increasing nursery-size if th...ists.
Oct 18 00:05:01 raspberrypi_02 mono[540]: Warning: Repeated degraded allocation. Consider increasing nursery-size.
With the service in place, a reboot starts Cumulus pretty reliably, and reboot is surprisingly quick. My sense is for the time required, a reboot seems to forestall some of what I'm seeing for 2-3 as long as a CMX restart.
I thought I had resolved the
nursery-size warning with this in the service definition...
Code: Select all
Environment=MONO_GC_PARAMS=nursery-size=8m
but maybe I need to increase it further.
Bottom line at this point is it seems to be getting the job done, other than the little blip at WU (MXdiags showed 1 exception near the end of that blip period). I may just keep and eye on it during the day and see if a functional issue surfaces. Swap usage is very small, though increasing slowly.
Bob