Are there any up-to-date offline/private agentic coding benchmark leaderboards?
If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.
Edit:
These look decent and generally match my expectations:
If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.
Edit: These look decent and generally match my expectations:
https://www.apex-testing.org/