More

mshockwave · 2026-03-30T00:09:32 1774829372

One thing I observed is that RVV code is usually slower in QEMU

mshockwave · 2026-03-12T18:58:30 1773341910

LLVM now has another way to implement RTTI using the `CastInfo` trait instead of `classof`: https://llvm.org/doxygen/structllvm_1_1CastInfo.html

But it's really just an implementation difference, the idea is still to have a lightweight RTTI.

mshockwave · 2026-02-06T17:05:34 1770397534

how did it do regalloc before instruction selection? How do you select the correct register class without knowing which instruction you're gonna use?

mshockwave · 2026-01-24T18:49:24 1769280564

> I don’t know many good reasons for extrusive linked lists

for one, its iterator won't be invalidated

dahart · 2026-01-25T16:47:55 1769359675

That depends on which array & extrusive linked list classes you’re talking about. Let me put it another way: in three decades of professional coding in scientific computing, video games, film vfx, web programming, and GPU driver and hardware development, I’ve never had to reach for an extrusive linked list for work. I’ve only ever used them for learning, teaching, and toy projects.

mshockwave · 2025-11-07T17:48:17 1762537697

Is it normal to spend 10minutes on tuning nowadays? Do we need to spend another 10 minutes upon changing the code?

anvuong · 2025-11-07T19:49:13 1762544953

You mean autotune? I think 10 minutes is pretty normal, torch.compile('max-autotune') can be much slower than that for large models.

Mars008 · 2025-11-07T22:40:51 1762555251

Add to that it can be done only once by developers before distribution for major hardware. Configs saved. Then on client side selected.

mshockwave · 2025-11-06T18:57:02 1762455422

It's likely that Swift compiler is using LLVM LIT (https://llvm.org/docs/CommandGuide/lit.html), which is implemented in python, as the test driver

airspeedswift · 2025-11-06T21:21:28 1762464088

Python and LIT are used heavily to build and test the compiler, but that is only for building it, you do not need it to download and use the built toolchain. The python dependency is more about its use in LLDB.

mshockwave · 2025-10-03T05:38:16 1759469896

> In the end, programs will want probably to stay conservative and will implement only the core ISA

Unlikely, as pointed out in sibling comments the core ISA is too limited. What might prevail is profiles, specifically profiles for application processors like RVA22U64 and RVA23U64, which the latter one makes a lot more sense IMHO.

sylware · 2025-10-04T09:37:48 1759570668

Come on, what was to be understood is to 'stick to the core ISA' as much as possible.

I had to clarify the obvious: if a program does not need more than a conservative usage of the ISA to run at reasonable speed, no hardcore change to the hardware should be investigated.

Additionnally, the 'adding new machine instructions' fan boys tend to forget about machine instruction fusion (they probably want they names in the extension specifications) which has to be investigated first, and often in such niche cases, it may be not the CPU to think about, but specialized ASIC blocks and/or FPGA.

mshockwave · 2025-09-28T19:52:16 1759089136

yes, it has been done for at least a decade if not more

> Even more of a wild idea is to pair up two cores and have them work together this way

I don't think that'll be profitable, because...

> When you have a core that would have been idle anyway

...you'll just schedule in another process. Modern OS rarely runs short on available tasks to run

mshockwave · 2025-09-28T18:10:15 1759083015

The article is easy to follow but I think the author missed the e point: branchless programming (a subset of the more known constant time programming) is almost exclusively used in cryptography only nowadays. As shown by the benchmarks in the article, modern branch predictors can easily achieve over 95% if not 99% precision since like a decade ago

mshockwave · 2025-09-17T17:30:57 1758130257

yes, the short answer is LLVM uses RegPressureTracker (https://llvm.org/doxygen/classllvm_1_1RegPressureTracker.htm...) to do all those calculations. Slightly longer answer: I should probably be a little more specific that in most cases, Machine Scheduler cares more about register pressure _delta_ caused by a single instruction, either traverses from bottom-up or top-down. In which case it's easier to make an estimation when some of other instructions are not scheduled yet.