> The term latency in this context refers to ... The time between a request was submitted to a queue and the worker thread finished processing the request.
Since this is a tuning guide, I would have liked to see a separation of 3 attributes:
* Service-time: The actual time taken by the thread/process once it begins processing a request.
* Latency: The time spent by the request waiting to get processed (ex: languishing in a queue on the client/server side). This is when the request was latent.
* Response time: A combination of Service-time + Latency as recorded by the server. From a client's POV, this would additionally include the overhead from the network media etc.
Most performance models seem to isolate these separately to get a deeper sense of where the bottlenecks are. When there's just a single queue for everything then it makes sense to make the service-time as short as possible. But if you have multiple workload-based queues then you can do more interesting things.
This guide pretty much tells you how to make the Linux kernel interfere as little as possible with your application. How to instrument and what to measure would depend on the application.
I agree that measuring queuing delay and processing delay separately makes sense.
This is a great point. For the purposes of queuing theory analysis, some separate out latency from response time in which case response time is just service time + queue time, and latency is transit time before arriving at the queue.
Since this is a tuning guide, I would have liked to see a separation of 3 attributes:
* Service-time: The actual time taken by the thread/process once it begins processing a request.
* Latency: The time spent by the request waiting to get processed (ex: languishing in a queue on the client/server side). This is when the request was latent.
* Response time: A combination of Service-time + Latency as recorded by the server. From a client's POV, this would additionally include the overhead from the network media etc.
Most performance models seem to isolate these separately to get a deeper sense of where the bottlenecks are. When there's just a single queue for everything then it makes sense to make the service-time as short as possible. But if you have multiple workload-based queues then you can do more interesting things.