ackerman80's comments

ackerman80 · on Aug 3, 2017

https://netsil.com/blog/kubernetes-vs-docker-vs-mesosphere/

ackerman80 · on July 27, 2017

it surely is: https://jeffknupp.com/blog/2014/04/15/how-devops-is-killing-...

ackerman80 · on Jan 26, 2017

Came across this which gives good insight into the 4 golden signals for a top-level health tracking: https://blog.netsil.com/the-4-golden-signals-of-api-health-a...

One thing of note in the graph is the tracking of response size. This would be very useful for 200 responses with "Error" in the text. Because then the response size would drop drastically below a normal successful response payload size.

In addition to Latency, Error Rates, Throughput and Saturation , folks like Brendan Gregg @ Netflix have recommended tracking capacity.

maplebed · on Jan 26, 2017

Are you a plant? It must just be coincidence that the second post in the series is titled "measuring capacity." :) https://honeycomb.io/blog/2017/01/instrumentation-measuring-...

(bias alert - I work on Honeycomb)

ackerman80 · on Jan 26, 2017

we are all learning from the same folks ahead of us it seems :)

I agree with other comments though the devil is in the details of how to actually setup these "golden signals" so that they are useful and not just drown everyone in packet level non-sense.

saravana87 · on Jan 26, 2017

TCP retransmission rates looks like a useful metric which can help in monitoring the health of a service. One way to obtain that is by analyzing service interactions as mentioned in the blog. Tracing could be another way through which we can find that info. I am curious as to how code instrumented monitoring solutions get that information. (PS: I work for Netsil)

bbrazil · on Jan 27, 2017

By default you can only get that per-kernel from /proc/net/netsnmp. BPF may allow something more granular.

The other way of approaching it is to look for the additional latency it causes, which you can spot on a per-service basis.

saravana87 · on Jan 27, 2017

Additional latency could be an indicator, but there's no guarantee that it is because of retransmissions ?

bbrazil · on Jan 27, 2017

If you look at your latency histogram and are seeing a bump at around 200ms above normal (which was the default minimum wait time a few years back anyway), it's probably retransmits.

saravana87 · on Jan 27, 2017

Got it.

rboyd · on Jan 27, 2017

you can get retransmits from 'sar' on linux

saravana87 · on Jan 27, 2017

I see. But, it looks like it is per host and there is no way to find out for a particular service running on the host.

ackerman80 · on Dec 20, 2016

Right now we maintain few select percentiles from the latency distribution over 1 min time-period. We plan to maintain latency histograms which will allow you to look at latency distribution on arbitrary time intervals.

coleca · on Dec 20, 2016

Any information on pricing?

smb06 · on Dec 20, 2016

Netsil AOC is priced by the number of vCPUs or cores that you would be monitoring. You can reach out to us at hello@netsil.com for the exact price quote based on your needs.