Looks like LLMs also find Dafny easier to write than Lean. This study, “A benchmark for vericoding: formally verified program synthesis”, reports:
> We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications … We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs.
Not surprising, as Dafny is a bit less expressive (refinement instead of dependent types) and therefore easier to write. IMHO, it hits a very nice sweet spot. The disadvantage of Dafny is the lack of manual tactics to prove things when SAT/SMT automation fails. But this is getting fixed.
If you haven't already, check out Microsoft's "The Windows® 95 User Interface: A Case Study in Usability Engineering" report summarizing some of the Windows 95 designers' user research:
I read it and it partially inspired the entire project. It made me realise how inaccessible modern design is despite being held up as best in class and easy to use
> That would be like antropic and google crying about china stealing the weights that were originally built by scraping as fuck stolen content :-)
do you really see a relation between the two, or are you just willfully 'buying an advertisement' by trying to shape a metaphor from the social qualms that you wish to rebroadcast to people?
in other words, no -- this isn't at all similar to the companies that steal media in order to train models only to complain about similar theft from other companies targetted towards them -- but I agree with the motivation, fuck em; they're crooks...
but don't weaken metaphors simply to advertise a social injustice. If you want to do that, don't hijack conversations abroad.
This is the first thing that occurred to me. The people above suggesting a cobol to python or go update confuse the heck out of me. Why not just convert to vanilla jacascript at that point? Bizarre
GitHub recently added new repository settings to turn off pull requests or limit them to approved contributors. The announcement doesn't mention AI agents, but that's certain relevant.
GH also needs to find a way to stop AI scraping of IP.
(Or not. It might be lucrative to host some novel algorithm on GH under a license permitting its use in generative LLM results, at a reasonable per-impression fee.)
> We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications … We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs.
https://arxiv.org/html/2509.22908v1
reply