CumulusMX Hangs, Slowdowns, and Failure Modes
Posted: Tue 22 Aug 2017 2:29 pm
I'm working on a sort of CumulusMX "watchdog" script to run alongside of Cumulus on a Raspberry Pi 3. Basically I'm having to restart CumulusMX periodically, and I'd like to automate this as much as possible. I'm using systemd to manage CumulusMX (e.g. https://cumulus.hosiene.co.uk/viewtopic.php?f=27&t=14753). I've been trying to observe and note various failure modes, although often the status of interest isn't quite a failure per se. I've found that systemd will restart CumulusMX quickly if the process is killed (mono actually). I have in mind to share what I learn in case others are interested.
Here are some things I am looking at:
* The <#DataStopped> tag is in a standalone file, which will return either 1 or 0
* The realtime.txt file on my (hosted) web server should update every 30 seconds in my configuration
* A bunch of html and json files should update every 5 minutes in my configuration
* The Raspbian O/S should not get bogged down
In general, a pretty reliable indicator of an issue (which I've not yet automated) seems to be the Dashboard web page, i.e. the default page at http://*:8998/. The local time widget (sample below) needs to be refreshed periodically, but at times I note that may hang beyond the normal couple of seconds. It looks like the realtime.txt file and html and json files may not be updating on the server, but I'm still investigating.
A note on bogged down: One time when I checked on CMX I had a tough time doing much in my usual ssh shell (I pretty much run the Pi headless, without the GUI, and do my admin from an ssh shell). Top looked very unusual...
Typically, I see the CPU usage for mono below 10% max and the load average pretty close to 0. Mem usage looks to typically run around 8%. I'm not monitoring that (yet), but it seemed like a one-off case. (I'm running Raspbian on a stock Pi3 with few, if any OS mods.) Oh, and I writing Python code because that's big in the embedded (and Pi) world ... and I'm trying to get more comfortable with it.
I'll share more as I figure it out...
Bob
Other Notes
I noticed when I was running the GUI and CumulusMX (and little else) my Pi was using a little swap space which I wanted to avoid. I now boot without the GUI and I'm using only about 50% of memory and am not close to swapping!
I already have a watchdog running for my (aging) web camera. I added a relay module to the Pi to automate what I had previously done by hand - power-cycle the camera. On my (hosted) web server a cron job already tracks the time since the last image upload (nominally a still image should be uploaded every 5 minutes) and sends me an email if over a certain age. The watchdog pulls down a text file which has the number of seconds since the last image update, and if over a threshold, it power-cycles the web cam. I'm using a copy of that script to tinker with the CMX watchdog.
Here are some things I am looking at:
* The <#DataStopped> tag is in a standalone file, which will return either 1 or 0
* The realtime.txt file on my (hosted) web server should update every 30 seconds in my configuration
* A bunch of html and json files should update every 5 minutes in my configuration
* The Raspbian O/S should not get bogged down
In general, a pretty reliable indicator of an issue (which I've not yet automated) seems to be the Dashboard web page, i.e. the default page at http://*:8998/. The local time widget (sample below) needs to be refreshed periodically, but at times I note that may hang beyond the normal couple of seconds. It looks like the realtime.txt file and html and json files may not be updating on the server, but I'm still investigating.
A note on bogged down: One time when I checked on CMX I had a tough time doing much in my usual ssh shell (I pretty much run the Pi headless, without the GUI, and do my admin from an ssh shell). Top looked very unusual...
Code: Select all
top - 08:30:43 up 2 days, 12:33, 3 users, load average: 15.56, 13.18, 12.78
Tasks: 142 total, 1 running, 141 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.6 us, 19.1 sy, 0.0 ni, 69.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 945512 total, 766604 used, 178908 free, 146700 buffers
KiB Swap: 102396 total, 0 used, 102396 free. 352756 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
529 root 20 0 628000 192208 17652 S 80.1 20.3 1807:22 mono
I'll share more as I figure it out...
Bob
Other Notes
I noticed when I was running the GUI and CumulusMX (and little else) my Pi was using a little swap space which I wanted to avoid. I now boot without the GUI and I'm using only about 50% of memory and am not close to swapping!
Code: Select all
$ free
total used free shared buffers cached
Mem: 945512 413904 531608 42248 144092 165656
-/+ buffers/cache: 104156 841356
Swap: 102396 0 102396