Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are there any up-to-date offline/private agentic coding benchmark leaderboards?

If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.

Edit: These look decent and generally match my expectations:

https://www.apex-testing.org/

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: