Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The difference is that software compilation is a fairly local optimization process - allowing you to change/inline one bit of generated code without significantly affecting the performance of the rest of the generated code

Not really.

The uop cache, and L1 code cache, of modern chips is rather small. You can often grossly increase performance locally by loop-unrolling, but if that causes the "hot path" to no longer fit in uop-cache (or L1 cache), then you've lost a chunk of global performance vs a small local-gain in performance.

Global vs local optimization is just a tough subject in general. Even on CPUs (which is probably easier than FPGAs)



OP wrote "a fairly local optimzation". Compared to what a placement algorithm must do for an FPGA, your example is still exactly that, and one that's relatively easy to get right with a few heuristics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: