> why do those things have a (statistically significant?) impact in the first place?
In a word, caches. Not just the instruction / data cache, but also page faults and micro-architectural features like micro-op caches, instruction TLB entries, loop stream buffers, cache-line alignment, and aliasing in the branch predictor tables (which can also be thought of as caches).
I suppose I was musing more along the lines of "why isn't this a solved problem". Clearly, it isn't an easy one or compilers would already take this into account and then the statistical variance would be reduced.
In a word, caches. Not just the instruction / data cache, but also page faults and micro-architectural features like micro-op caches, instruction TLB entries, loop stream buffers, cache-line alignment, and aliasing in the branch predictor tables (which can also be thought of as caches).