More

dwaltrip · 2026-06-13T18:00:07 1781373607

They should follow the principle of least privilege. Why not use differential privacy?

dwaltrip · 2026-06-12T01:25:16 1781227516

Fable is a lot like Opus at its best. It's simply more reliable and feels a bit smarter. For my use cases, using it feels very nice, and notably better than Opus. It needs less direct guidance to get reasonable looking code and I don't have to watch it as closely.

For context, my Claude Code working style is quite heavy on discussion "to align" before implementing anything. We also use a good amount of Markdowns.

Oh yeah, it also is has way less "phrasing quirks" and is a clearer communicator. Opus 4.8 was a bit of loon with some of its writing styles. I had mostly straightened it out, but not entirely. It would use the most ridiculous flair at times.

willsmith72 · 2026-06-12T06:03:38 1781244218

Yeah same here, it's a huge step up for me. Curious why people are having such different experiences. Is it just to do with what they're working on? Specific prompt styles (eg overfitting on opus)?

moffkalast · 2026-06-12T11:39:46 1781264386

I would go out on a limb and say it's a garbage in garbage out problem. People just don't define their problem well enough nor provide enough context and are surprised the model can't magically read their mind and summon data that doesn't exist from thin air. There's only so much raw intelligence can compensate for not having literally anything to go on.

10 years ago this was a joke, now it's Tuesday: https://old.reddit.com/r/ProgrammerHumor/comments/2vk4ph/mac...

dwaltrip · 2026-06-13T04:08:27 1781323707

That’s so wild to read that 10 year old meme post. Very prescient. And yes, so accurate! hah

winrid · 2026-06-12T04:29:00 1781238540

I've had Fable add Chinese characters to our conversation for no reason.

elbear · 2026-06-12T11:39:27 1781264367

I've only had that happen with Chinese models until now. Interesting that Fable is doing it too.

winrid · 2026-06-12T18:07:31 1781287651

I've also had Fable successfully build a text editor (quill integration) into a Vaadin project that randomly loses its content after you type a few characters (this is on the 3rd iteration).

sulam · 2026-06-12T13:40:11 1781271611

I’ve had Opus randomly insert (correct) Russian words into responses. It’s like their training data includes some bilingual forums where idiomatic Russian speakers congregate.

maaaaattttt · 2026-06-12T09:51:00 1781257860

Could it be that Anthropic is using the Chinese characters trick to consume less tokens behind the scenes?

noddybear · 2026-06-12T12:56:53 1781269013

Aren’t Unicode characters generally treated as 2 tokens to avoid a huge vocabulary?

winrid · 2026-06-12T18:06:03 1781287563

It used a chinese character instead of the word "true"

taikon · 2026-06-12T07:35:05 1781249705

Same here

isaacdl · 2026-06-12T11:49:46 1781264986

I dunno, in my limited use, Fable is MORE prone to phrasing quirks. I had it use, for real, the phrase "load-bearing for correctness" yesterday. It meant something about not needing a validation check because something else (the "load-bearing" part) was already checking it.

I do agree that it *feels* nicer and smarter to use.

mpalmer · 2026-06-12T12:37:50 1781267870

I think the tension here is that phrasing like this actually helps keep the model aligned, which is why the training and RL converged on it. But it's so annoying to read!

efromvt · 2026-06-12T15:11:44 1781277104

repetition of "belt-and-suspenders" kills me with opus, especially because it always means the model is suppressing something I would want to be an actual failure

tirutiru · 2026-06-12T05:27:53 1781242073

How did you straighten it out?

I am drowning in gating propagating semantic mismatches...

dwaltrip · 2026-06-12T06:01:30 1781244090

Hah, yeah... I added this to my global CLAUDE.md (~/.claude/CLAUDE.md):

## Writing voice — plain, factual, calibrated to the evidence

Write docs, session notes, commit messages, and findings plainly and factually — and calibrate every claim you assert, in chat as much as in writing. This guards against a known LLM tendency to inflate: toward punchy phrasing and claims that read as more settled than the work supports. Same spirit as the Read-Clean Check above, and composes with it — that rule governs journey-framing, this one governs tone and certainty.

*Plain over punchy.* Skip decorative metaphors and dramatic verbs when a plain word is clearer — call a fix "the change", not "the hammer"; logging "flags" a problem rather than being "radar"; numbers "grow", they don't "explode". Plain phrasing reads as engineering; flourish reads as marketing.

*Calibrated confidence.* Everything stated should be well-reasoned and defensible, with the strength of the wording matched to the strength of the evidence. Prefer "found" / "appears" / "points to" over "proved" / "clearly" / "obviously". Name the confounds and what's still unverified. Don't let a bold lead-in pre-announce a conclusion the work hasn't reached.

*Hypotheses stay labeled as hypotheses.* Speculation and educated guesses are useful — when brainstorming or investigating, surface them, and sharing a strong view is welcome. But conviction is not evidence: until there is clear evidence, a claim is a hypothesis and is stated as one — explicitly, even when it's highly compelling. The failure mode is asserting a hunch as settled fact, where it then propagates unchallenged into later docs and summaries. Back a claim with its evidence in the same breath, or mark it as not-yet-backed.

*Factual and forward-looking.* Separate what was measured from what was inferred, and stay pragmatic about what's true, what's still open, and what's next. On next steps specifically, resist the strong LLM pull to converge prematurely:

- A plausible next step is not a decided one. Don't present one or two plausible tasks as the one path we should now follow — that lock-on is a frequent failure mode. - Lay out the real options and their trade-offs. Saying which you'd lean toward and why is welcome and useful — but keep the space open and leave the choice to the user. - Premature certainty about what to do next is as much a miscalibration as premature certainty about what's true.

sulam · 2026-06-12T13:43:43 1781271823

Have you tried optimizing this prompt so that it’s shorter but gets the same results? I see these super verbose prompts all the time from people who learned prompt engineering in the ‘24-early ‘25 timeframe and they seem unnecessary to me (I get good results with 1-3 sentences) but I hate to assume other people’s experience mirrors my own.

dwaltrip · 2026-06-13T02:09:24 1781316564

That's a good idea. Claude wrote that for me a week or so ago. It could definitely be tightened.

dwaltrip · 2026-06-09T19:10:20 1781032220

METR is an independent organization.

dwaltrip · 2026-06-08T17:07:12 1780938432

I'm so sick of people who peddle outrage for a living.

dwaltrip · 2026-06-03T21:29:48 1780522188

The Alaskan sovereign wealth fund is a much better metaphor.

akramachamarei · 2026-06-04T00:18:10 1780532290

I see the comparison--like AI companies are mining value out of public corpora, somewhat like how an oil or mining company extracts resources from the earth. An important difference being that when minerals are extracted they are removed from common use, unlike when a model is trained, which does not subtract from the commons (at least not directly, or substantially?)

dwaltrip · 2026-06-03T15:22:39 1780500159

You aren’t really engaging with the substance or heart of the post, and your reading feels a bit knee-jerky and bad-faith to me.

briandw · 2026-06-10T04:13:51 1781064831

So what was the substance that I missed?

dwaltrip · 2026-06-03T02:42:07 1780454527

The total market size might be low this year or the next. But, for better or worse, humans will continue to push into the unknown.

Reusable rockets will change the economics of space travel beyond recognition. Jevon’s paradox will strike hard and fast. Starlink is the initial proof of this.

Maybe Starship will be the first to achieve the fabled dream of rapid reusability. Maybe not. Either way, it’s a tractable engineering problem at this point and the path has been made pretty clear.

I have no idea what the valuation of SpaceX should be. But, in general, I’d bet a lot on the launch industry growing enormously in the coming decades.

ndsipa_pomu · 2026-06-03T08:19:16 1780474756

The increasing amount of space debris will likely change the economics of getting satellites into space and keeping them there. The more junk there is, the more likely that it's going to hit something and create yet more debris.

dwaltrip · 2026-05-28T18:04:26 1779991466

Wait, doesn’t the blog post say the price is the same as 4.7?

> Claude Opus 4.8 is available everywhere today. Pricing for regular usage is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Pricing for fast mode is $10 per million input tokens and $50 per million output tokens.

Where do you see the 2x cost?

XCSme · 2026-05-28T18:10:34 1779991834

The total cost of running my benchmarks, was 1.6x higher compared to Opus 4.7, mostly because of 2x output tokens:

https://i.snipboard.io/vrdwTa.jpg

dwaltrip · 2026-05-28T19:36:51 1779997011

ah ok, thanks for clarifying!

spprashant · 2026-05-28T18:18:29 1779992309

If it spends 2x tokens to achieve the same result, that's effective 2x cost in a manner of speaking

dwaltrip · 2026-05-28T17:54:00 1779990840

“Grown” is a highly apt metaphor, IMO. It quite succinctly captures some of the most fundamental differences between building Claude and building an Ikea desk, for example.

dwaltrip · 2026-05-28T17:50:34 1779990634

If you are using Claude code, just set effort to xhigh.

This one change will probably solve 80% of the problems you have noticed.

orwin · 2026-05-28T18:03:44 1779991424

This. XHigh and the 'plan' mode for complex tasks is absolutely a must have.

Still, the context window is sometimes too small for my usage.

jayGlow · 2026-05-28T20:15:18 1779999318

agent teams can help with that, the main agent acts as an orchestrator and spawns sub agents to do the actual tasks it generally keeps the main context from overflowing.

whatevaa · 2026-05-28T20:08:27 1779998907

Isn't xhigh on opus 4.7 very expensive on tokens?

dwaltrip · 2026-05-28T20:25:52 1779999952

I’ve never ran into the limits on the $100 plan, and rarely even get close.

I normally have only one session going at once though.

joshstrange · 2026-05-28T21:08:52 1780002532

Same here and while I have multiple sessions going from time to time, my day isn't spent primarily developing software directly anymore (due to role, nothing about LLMs).

I only ever hit the $100/mo limits 1-2 times ever and it was always <1hr before reset (once it was <5min, the other was like ~45min).

I'm even considering going back down to $20 and using extra usage for the times I need to "burst".

sumedh · 2026-05-29T10:49:43 1780051783

Yes but Anthropic made a deal with SpaceX and increase usage limits by 50%, so you might not hit your limits.