Interesting, and the author clearly is an expert on the topic. However, I'm curious about how this fits in to a sensible program architecture. When do you need such a queue? It seems to me quite unlikely that production and consumption operate at the exact same rate and with the same CPU usage rate such that the queue runs in a steady state of neither full nor empty. It seems much more likely that the queue will just be either full or empty all the time, in which case this algorithm will be terrible because it busy-waits instead of having condition variables that can be used to wake the consumers or block the producers. By the way blocking the producers is also known as flow control, which many people believe is a desirable property of software.
If your queue is normally empty, it would be fine for the producing threads to just jump straight into the consuming routine and nevermind the queue. If the queue is normally full, it's fine for them to just take a lock and sleep on a condition.
Normally what you want a p/c queue for is to paper over short term variance in a system where on long time scales the consumer side is faster than the producer side. In that role, an ordinary queue with a mutex will work fine. If the consumer is stalled for long enough to fill the queue, blocking the producer is exactly what you want.
Of course, some languages favor the use of this pattern. I'm thinking of Go channels. Totally lock-free Go channels would be pretty awesome, but busy-waiting in select would not be awesome.
This is a good question. Currently we've integrated the queue in two projects in production state: user-space proxy server and VPN capturer (Linux kernel module).
In both the cases we enhanced the queue with conditions like is the queue empty or is it full, so we can drop packets if consumers are overloaded and there is no sense to put a packet to the queue.
Secondly, the queue is designed to work on multi-core environments and this is usually multi-node NUMA systems for modern hardware. So we adjusted both the applications in such manner that administrator can assign different number of cores for all processors for consumers and producers depending on current workload. This makes the system mode balanced, so queue is empty or full rarely.
And finally, for kernel space we also implemented lightweight also lock-less condition wait for the queue (http://natsys-lab.blogspot.ru/2013/08/lock-free-condition-wa...). It also gives us lower CPU consumption when there is no workload (so the system eats less power) and even better performance due to reduced cache bouncing.
…in which case this algorithm will be terrible because it busy-waits instead of having condition variables…
I'm seeing sched_yield() calls in there. It looks like a blocked process will yield its CPU core to available, productive work.
If there isn't enough work to keep all the CPU cores busy then it will spin around asking "Am I ready?" more often than required, but at that point you have a machine under less than full load and it doesn't really matter. (power use aside, also assuming they stay in cache while doing that).
If your queue is normally empty, it would be fine for the producing threads to just jump straight into the consuming routine and nevermind the queue. If the queue is normally full, it's fine for them to just take a lock and sleep on a condition.
Normally what you want a p/c queue for is to paper over short term variance in a system where on long time scales the consumer side is faster than the producer side. In that role, an ordinary queue with a mutex will work fine. If the consumer is stalled for long enough to fill the queue, blocking the producer is exactly what you want.
Of course, some languages favor the use of this pattern. I'm thinking of Go channels. Totally lock-free Go channels would be pretty awesome, but busy-waiting in select would not be awesome.