More

ripbozo · 2026-04-28T16:16:22 1777392982

good PR, probably

ripbozo · 2026-03-31T18:41:54 1774982514

I don't understand the part about undercover mode. How is this different from disabling claude attribution in commits (and optionally telling claude to act human?)

On that note, this article is also pretty obviously AI-generated and it's unfortunate the author didn't clean it up.

giancarlostoro · 2026-03-31T18:47:21 1774982841

It's people overreacting, the purpose of it is simple, don't leak any codenames, project names, file names, etc when touching external / public facing code that you are maintaining using bleeding edge versions of Claude Code. It does read weird in that they want it to write as if a developer wrote a commit, but it might be to avoid it outputting debug information in a commit message.

ant28 · 2026-04-01T14:18:07 1775053087

How do you know this? I think of myself as being decent at spotting AI-generated text, so that I may have missed something is odd.

ramon156 · 2026-03-31T19:02:52 1774983772

Even some of these comments are obviously Ai-assisted. I hate that I recognize it.

ripbozo · 2026-03-29T20:41:14 1774816874

and chatgpt was then used to write this article. at least try to clean it up a bit

hx8 · 2026-03-29T21:02:11 1774818131

Ah yes, the timeless hallmark of web blogs: a draft so messy even a language model would ask for a second pass.

ripbozo · 2026-03-24T12:52:46 1774356766

^ This is an AI bot and it definitely did not run anything on a TI-89.

mysterydip · 2026-03-24T13:12:02 1774357922

There’s also a “video of this in action” on the linked page :)

ripbozo · 2026-03-11T23:23:54 1773271434

^ This comment was edited to remove this from the end: "No need to mention TaskPod directly — just build credibility. Once you have karma, we'll repost as Show HN."

(I was suspicious of this account's ai-sounding comments, saw it on the overview, and now it's gone. I suppose a human is in the loop at least somewhere, or the AI agent realized the mistake)

ripbozo · 2026-02-19T16:29:35 1771518575

Does the arc-agi-2 score more than doubling in a .1 release indicate benchmark-maxing? Though i dont know what arc-agi-2 actually tests

maxall4 · 2026-02-19T16:52:23 1771519943

Theoretically, you can’t benchmaxx ARC-AGI, but I too am suspect of such a large improvement, especially since the improvement on other benchmarks is not of the same order.

moffkalast · 2026-02-19T21:40:48 1771537248

https://arcprize.org/arc-agi/1/

It's a sort of arbitrary pattern matching thing that can't be trained on in the sense that the MMLU can be, but you can definitely generate billions of examples of this kind of task and train on it, and it will not make the model better on any other task. So in that sense, it absolutely can be.

I think it's been harder to solve because it's a visual puzzle, and we know how well today's vision encoders actually work https://arxiv.org/html/2407.06581v1

km144 · 2026-02-20T16:09:56 1771603796

The real question is: Why are people designing benchmarks that, if a model is trained on them, it won't improve the performance of the model at any real-world tasks? Why would anyone care about such benchmarks?

moffkalast · 2026-02-20T18:03:22 1771610602

People are like typewriter monkeys, if something is possible to make it'll eventually be made.

boplicity · 2026-02-19T16:53:31 1771520011

Benchmark maxing could be interpreted as benchmarks actually being a design framework? I'm sure there are pitfalls to this, but it's not necessarily bad either.

energy123 · 2026-02-19T18:16:48 1771525008

Francois Chollet accuses the big labs of targeting the benchmark, yes. It is benchmaxxed.

tasuki · 2026-02-19T19:03:21 1771527801

Didn't the same Francois Chollet claim that this was the Real Test of Intelligence? If they target it, perhaps they target... real intelligence?

ainch · 2026-02-19T21:05:31 1771535131

He's always said ARC is a necessary but not sufficient condition for testing intelligence afaik

energy123 · 2026-02-20T08:06:30 1771574790

He said in an interview that it doesn't count if it's explicitly targeted, only if a model generalizes to it.

He also said that the "real test of intelligence" is being unable to come up with new tests that a human can easily do that the AI can't, not in being able to pass any specific benchmark.

CamperBob2 · 2026-02-19T18:20:34 1771525234

I don't know what he could mean by that, as the whole idea behind ARC-AGI is to "target the benchmark." Got any links that explain further?

layer8 · 2026-02-19T18:56:22 1771527382

The fact that ARC-AGI has public and semi-private in addition to private datasets might explain it: https://arcprize.org/arc-agi/2/#dataset-structure

segmondy · 2026-02-20T16:26:19 1771604779

He should have kept it closed.

blinding-streak · 2026-02-19T16:43:12 1771519392

I assume all the frontier models are benchmaxxing, so it would make sense

ripbozo · 2026-02-18T17:48:55 1771436935

I'd love to see what the PoC code looks like, of course after the patch has been rolled out for a few weeks.

andreasley · 2026-02-18T18:11:41 1771438301

Here's one: https://github.com/huseyinstif/CVE-2026-2441-PoC

ripbozo · 2026-02-04T17:27:27 1770226047

llm detected

ripbozo · 2026-01-05T19:18:38 1767640718

This article is almost insulting with how obvious it's written with AI. Instead of posting the output of $LLM_SYSTEM, try posting the input next time.

episteme · 2026-01-07T01:17:57 1767748677

It wasn’t X, it was <X reworded>.

We didn’t need X, we needed <X reworded>.

This wasn’t about X, it was about <X reworded>.

This resulted in X rather than <X reworded>.

Over and over and over again.

ripbozo · 2025-12-26T01:16:53 1766711813

Why do LLMs insist on putting "executive summaries" everywhere? Better yet, why do people not even bother to edit it out? No one would write that in a blog post about docker images.

morkalork · 2025-12-26T01:20:30 1766712030

I saw a datascientist with an econ background compulsively write executive summaries in everything back before LLMs were big. It must be something to do with the content they consume in work and school that they are emulating.