I think the emphasis on coding/math is just because those are the low hanging fruit - they are relatively easy to provide reasoning verification for, both for training purposes and for benchmark scoring. The fact that you can then brag about how good your model is at math, which seems like a high intelligence activity (at least when done by a human) doesn't hurt either!
Reasoning verification in the general case is harder - it seems "LLM as judge" (ask an LLM if it sounds right!) seems to be the general solution.
Reasoning verification in the general case is harder - it seems "LLM as judge" (ask an LLM if it sounds right!) seems to be the general solution.