Page 1 of 1

WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 3:55 pm
by radilly
This is a spin-off of WU update: The Task was canceled.

Over the past couple of days I've see some odd behavior with WeatherCloud updating / observation posting. I suspect they changed something which slowed down the reply to an update. I haven't found anything about the protocol by the way, but it certainly looks like HTTP from the Diags log.

At any rate, sometime around 2018-01-23 11:20 (Eastern US), it looks like the reply time from WeatherCloud increased noticeably. Before that it looks like it was running around 30 seconds. Afterward it appears to have jumped to over 1:40 (or 100 secs), what apparently generates this message as the task waiting for the reply times out. (Same timing for WU it appears.) This seemed to occur at every attempt for about 19 hours.

This morning, around 2018-01-24 06:40 (Eastern US), it looks like the reply time dropped just enough to avoid this message ... but with very little margin. The margin looks to be on the order of 10 sec or less, or > 10% if you will. A portion of the Diags log, grepped for WeatherCloud follows below.

In general, it seems like the observations were getting through, so the delay could just be on the confirmation handshake.

I did report this to WeatherCloud but haven't heard anything back ... though maybe the change in timing this morning is a reply of sorts. :roll:

Cheers,
Bob

Code: Select all

20180123-152519.txt:2018-01-24 06:21:40.981 WeatherCloud update: The Task was canceled
20180123-152519.txt:2018-01-24 06:31:41.021 WeatherCloud update: The Task was canceled
20180123-152519.txt:2018-01-24 06:41:34.019 WeatherCloud Response: OK: 200
20180123-152519.txt:2018-01-24 06:51:40.075 WeatherCloud update: The Task was canceled
20180123-152519.txt:2018-01-24 07:00:24.112 WeatherCloud Response: OK: 429
20180123-152519.txt:2018-01-24 07:11:38.906 WeatherCloud Response: OK: 200
20180123-152519.txt:2018-01-24 07:21:36.667 WeatherCloud Response: OK: 200

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 4:27 pm
by ConligWX
DNS issue perhaps? just guessing. seems a rather strange occurance from your system if now two websites are having the same issue of uploading data to the internet. any heavy net use on your network affecting throughput or bad LAN cables? I'm just thinking out loud.

can you ping WU api and weathercloud servers from your cumulusmx box ok? make sure its a ping run for 5 minutes or so.

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 4:28 pm
by radilly
Here's a little speculation... Having done a little work in asynchronous communications, it occurred to me that the fact that WeatherCloud states that the upload frequency is fixed at 10 minutes, it is very likely they have created a (uneven) loading problem for themselves.

Cumulus appears to be very good about initiating uploads at the top of the minute. In other words, 12:00, 12:10, 12:20, .. etc. Now if most of the users submitting uploads use this approach, and have their system clocks well-synched to a time-server, it's not hard to imagine that WeatherCloud's servers get hammered every 10 minutes. That presents a potentially severe scalability challenge for their servers.

So what to do?

I can think of 2 things: (1) to maintain the appealing and aesthetic idea of updates at the top of the (10th) minute, the timeout period could be longer ... up to something just shy of 10 minutes. (2) In mesh networks, the timing of transmissions is generally randomized in some way. Playing with that idea, if one could select a "skew" or delay period, of 0-10 minutes after the top of the (10th) minute for upload, one might be able to tune CMX to avoid this message in Diags. Not as aesthetic but maybe practical. (Yep ... I know Steve is retired!) :!:

Don't know that there's anyway to do this outside of a CMX change... short of messing with your system clock!

Just wanted to make a note of the thought...

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 4:39 pm
by radilly
Toxic17 wrote:DNS issue perhaps? just guessing. seems a rather strange occurance from your system if now two websites are having the same issue of uploading data to the internet. any heavy net use on your network affecting throughput or bad LAN cables? I'm just thinking out loud.

can you ping WU api and weathercloud servers from your cumulusmx box ok? make sure its a ping run for 5 minutes or so.
Hey Simon-

I thought about that sort of thing a little. With some hesitation, I'm inclined to say nothing has changed, and least nothing I can see or think of.

Pinging is not a bad idea. Easy to try. Have you come across any documentation on the interface to WeatherCloud? (I know the WU protocol reasonably well.) I believe ping / ICMP may be handled a little differently, but I should think its the same plumbing between the Pi and their server(s)...

Thanks!
Bob

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 6:02 pm
by ConligWX
I take it your using a good source for ntp? you have the correct time on your system? I point most of my servers and devices to uk.pool.ntp.org

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 6:03 pm
by radilly
OK - Some ping results...

I let this run over 10 minutes just to see if there'd be a spike somewhere. There wasn't. I would say the average was roughly around 150 ms, and the variance on the order of a couple of ms.

Code: Select all

64 bytes from weathercloud.net (188.164.192.31): icmp_seq=240 ttl=53 time=150 ms
Weather Underground seems to run in the low 20 ms range. I've noticed 2 addresses so far. (Likely they have more than that...)

Code: Select all

64 bytes from a23-194-109-128.deploy.static.akamaitechnologies.com (23.194.109.128): icmp_seq=1 ttl=57 time=21.1 ms
64 bytes from a23-194-182-156.deploy.static.akamaitechnologies.com (23.194.182.156): icmp_seq=1 ttl=58 time=24.1 ms
I don't see where either of these represents much latency at this point, relative to the 100 sec timeout.

I'll make one more observation, and the Diags log snippet below illustrates this. WU seems to be healthy again. I have it updating every minute. Assuming that task (and all updates on a multiple of a whole minute) fires at the top of the minute, the reply is coming back very quickly, well under 1 sec. By comparison, it appears that WeatherCloud replied just a half-second before the 1:40 timeout. That's cutting it mighty close.

Code: Select all

2018-01-24 12:30:00.483 WU Response: OK: success

2018-01-24 12:31:00.295 WU Response: OK: success

2018-01-24 12:31:39.560 WeatherCloud Response: OK: 200
2018-01-24 12:32:00.318 WU Response: OK: success

2018-01-24 12:33:00.316 WU Response: OK: success
Looking back over the log since this stopped failing on every attempt, roughly 90% are squeaking by. The average "margin" looks to be considerably less than 10 sec.

Some days my son works from our home and I will use a lot of bandwidth for short bursts. He's not been around much the past couple of days. Our FiOS connection is pretty decent relative to the amount of data we're talking about...
2018-01-24_12-55-18_FiOS_Speed_Test.jpg
In a nutshell, I don't think I've yet found a smoking gun on the network side.

Cheers, and Thanks,
Bob

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 6:12 pm
by Mapantz
I've had this issue for about two weeks

Code: Select all

2018-01-23 22:50:21.867 WeatherCloud update: An error occurred while sending the request.
2018-01-24 00:01:40.654 WeatherCloud update: A task was canceled.
Log files are full of those.

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 6:20 pm
by radilly
Toxic17 wrote:I take it your using a good source for ntp? you have the correct time on your system? I point most of my servers and devices to uk.pool.ntp.org
I've not changed from the defaults, *.debian.pool.ntp.org. Anecdotally I know it tracks much closer than my PC. I fiddled with the PC a little to improve it, as in the "LOCAL TIME" block on the local CMX dashboard I'd sometimes see "updates" in the future by 10-15 seconds. :shock:

My quick and dirty check is to go to https://www.time.gov/ and in a ssh term, run

Code: Select all

watch -n 0.1 date +%T
Putting them side by side, visually, they appear to be well under a second apart.

I should think that's "close enough."

Bob

Re: WeatherCloud update: The Task was canceled

Posted: Wed 24 Jan 2018 6:27 pm
by radilly
Mapantz wrote:I've had this issue for about two weeks

Code: Select all

2018-01-23 22:50:21.867 WeatherCloud update: An error occurred while sending the request.
2018-01-24 00:01:40.654 WeatherCloud update: A task was canceled.
Log files are full of those.
I've not seen the first of those yet. In the other thread I mention in the first post, I listed the unique responses I've seen over the few weeks I've updating to WeatherCloud...

Code: Select all

WeatherCloud Response: InternalServerError: <h1>CException</h1>
WeatherCloud Response: OK: 200
WeatherCloud Response: OK: 429
WeatherCloud update: Connection reset by peer
WeatherCloud update: The Task was canceled
Well, just added the 4th one. That last one dominates, especially over the past 24 hours or so.

Bob