More

vonneumannstan · 2026-04-10T15:48:56 1775836136

>This initiative probably could have started a few months sooner with Opus and similar models, though.

Evidently they tried and even the most recent Opus 4.6 models couldn't find much. Theres been a step change in capabilities here.

causal · 2026-04-10T15:54:21 1775836461

No, Opus has found a lot and 112 vulnerabilities were reported to Firefox alone by Opus [0]. But Mythos is uniquely capable of exploiting vulnerabilities, not just finding them.

[0] https://red.anthropic.com/2026/mythos-preview/

vonneumannstan · 2026-04-13T14:57:53 1776092273

Doesn't even seem to be in the same ballpark of capability.

https://red.anthropic.com/2026/mythos-preview/FRT-Blog-Chart...

vonneumannstan · 2026-04-10T15:48:04 1775836084

I think that would be a good precedent given the current lack of rules around AI Safety. These models don't seem to be plateauing yet and could be much more dangerous than Mythos in 1-2 years.

vonneumannstan · 2026-04-10T15:46:28 1775835988

They seem like fundamentally different things.

vonneumannstan · 2026-04-08T20:33:39 1775680419

Perhaps feeding strangers homemade GMOs isn't the best idea...

vonneumannstan · 2026-04-07T19:16:44 1775589404

>If we don't innovate, someone else will.

Terrible take. You don't get to push the extinction button just because you think China will beat you to the punch.

>This is the very nature of being a human being. We summit mountains, regardless of the danger or challenge.

No, just no... We barely survived the Cold War, at times because of pure luck. AI is at least as dangerous as that, if not more. We have far exceeded our wisdom relative to our capabilities. As you have so cleanly demonstrated.

dist-epoch · 2026-04-07T20:59:36 1775595576

You assume there is the option of not pushing the extinction button. Nobody asked chimps if they wanted humans around. This processes are outside control.

hollerith · 2026-04-09T01:25:17 1775697917

Humanity stopped germ-line human genetic engineering (possible since the early 1970s) and humanity can (and should) stop OpenAI, Anthropic, etc.

Datacenters that use literal gigawatts of electricity are not exactly easy to conceal from the authorities.

vonneumannstan · 2026-04-08T20:04:38 1775678678

Until recently Claude wasn't building itself. A group of people with agency were.

vonneumannstan · 2026-04-07T18:55:25 1775588125

Are you guys ready for the bifurcation when the top models are prohibitively expensive to normal users? If your AI budget $2000+ a month? Or are you going to be part of the permanent free tier underclass?

adi_kurian · 2026-04-07T19:05:35 1775588735

If one is to believe the API prices are reasonable representation of non subsidized "real world pricing" (with model training being the big exception), then the models are getting cheaper over time. GPT 4.5 was $150.00 / 1M tokens IIRC. GPT o1-pro was $600 / 1M tokens.

vonneumannstan · 2026-04-07T19:27:41 1775590061

You can check the hardware costs for self hosting a high end open source model and compare that to the tiers available from the big providers. Pretty hard to believe its not massively subsidized. 2 years of Claude Max costs you 2,400. There is no hardware/model combination that gets you close to that price for that level of performance.

adi_kurian · 2026-04-07T19:41:16 1775590876

Yes that's why I said API price. I once used the API like I use my subscription and it was an eye watering bill. More than that 2 year price in... a very short amount of time. With no automations/openclaw.

lostmsu · 2026-04-08T16:00:54 1775664054

Are you considering batch inference?

OsrsNeedsf2P · 2026-04-07T19:13:14 1775589194

Inference for the same results has been dropping 10x year over year[0]

[0] https://ziva.sh/blogs/llm-pricing-decline-analysis

ceejayoz · 2026-04-07T19:18:51 1775589531

Sure, but "the same results" will rapidly become unacceptable results if much better results are available.

hibikir · 2026-04-07T19:34:53 1775590493

When we go with any other good in the economy, price is always relevant: After all, the price is a key part of any offering. There are $80-100k workstations out there, but most of us don't buy them, because the extra capabilities just aren't worth it vs, say a $3000 computer, and or even a $500 one. Do I need a top specialist to consult for a stomachache, at $1000 a visit? Definitely not at first.

There's a practical difference to how much better certain kinds of results can be. We already see coding harnesses offloading simple things to simpler models because they are accurate enough. Other things dropped straight to normal programs, because they are that much more efficient than letting the LLM do all the things.

There will always be problems where money is basically irrelevant, and a model that costs tens of thousand dollars of compute per answer is seen as a great investment, but as long as there's a big price difference, in most questions, price and time to results are key features that cannot be ignored.

esafak · 2026-04-07T19:31:22 1775590282

Or will they rapidly become indistinguishable since they both get the job done?

swader999 · 2026-04-07T19:37:43 1775590663

Yes, it will always be an arms race game.

asadm · 2026-04-07T20:19:30 1775593170

if it can pay my rent, why not?

vonneumannstan · 2026-04-07T18:54:02 1775588042

Lol you haven't used a model since GPT2 is what it sounds like.

skippyboxedhero · 2026-04-07T19:07:04 1775588824

Just checked my subscription start date for Anthropic. September 2023, I believe before they announced public launch.

Sorry kid.

SyneRyder · 2026-04-07T19:32:12 1775590332

Genuine question - if you don't think the models are improved or that the code is any good, why do you still have a subscription?

You must see some value, or are you in a situation where you're required to test / use it, eg to report on it or required by employer?

(I would disagree about the code, the benefits seem obvious to me. But I'm still curious why others would disagree, especially after actively using them for years.)

skippyboxedhero · 2026-04-07T19:48:28 1775591308

The assumption that the other person made was that I would only use it for coding. If you look through my other comments today, I suggest that they are useful for performing repetitive tasks i.e. checking lint on PR, etc. Also, can be used for throwaway code, very useful.

I don't think the issue is with the model, it is with the implication that AGI is just around the corner and that is what is required for AI to be useful...which is not accurate. The more grey area is with agentic coding but my opinion (one that I didn't always hold) is that these workflows are a complete waste of time. The problem is: if all this is true then how does the CTO justify spending $1m/month on Anthropic (I work somewhere where this has happened, OpenAI got the earlier contract then Cursor Teams was added, now they are adding Anthropic...within 72 hours of the rollout, it was pulled back from non-engineering teams). I think companies will ask why they need to pay Anthropic to do a job they were doing without Anthropic six months ago.

Also, the code is bad. This is something that is non-obvious to 95% of people who talk about AI online because they don't work in a team environment or manage legacy applications. If I interview somewhere and they are using agentic workflow, the codebase will be shit and the company will be unable to deliver. At most companies, the average developer is an idiot, giving them AI is like giving a monkey an AK-47 (I also say this as someone of middling competence, I have been the monkey with AK many times). You increase the ability to produce output without improving the ability to produce good output. That is the reality of coding in most jobs.

AI isn't good enough to replace a competent human, it is fast enough to make an incompetent human dangerous.

carbon_14 · 2026-04-08T17:22:53 1775668973

so true.

vonneumannstan · 2026-04-07T19:18:29 1775589509

So you are doubly stupid, by not seeing any improvement in the models and also paying for models you believe are terrible? lol

skippyboxedhero · 2026-04-07T19:22:40 1775589760

That doesn't follow logically from what I said. You should ask your AI for help with this. You are in need of some artificial intelligence.

vonneumannstan · 2026-04-06T16:59:58 1775494798

This seems like something uniquely suited to the startup ecosystem. I.e. offering PQ Encryption Migration as a Service. PQ algorithms exist and now theres a large lift required to get them into the tech with substantial possible value.

hlieberman · 2026-04-06T17:19:29 1775495969

… really? This is simultaneously so far down in the plumbing and extremely resistant to measuring the impact of, I can’t imagine anyone building a company off of this that’s not already deep in the weeds (lookin’ at you, WolfSSL).

The idea that a startup would be competitive in the VC “the only thing that matters are the feels” environment seems crazy to me.

OhMeadhbh · 2026-04-06T17:34:31 1775496871

Yeah... I spent the 90s working for RSADSI and Certicom implementing algorithms. Crypto is a vitamin, not an aspirin. Hardly anyone is capable of properly assessing risk in general, much less the technical world of information risk management. Telling someone they should pay you money to reduce the impact of something that may or may not happen in the future is not a sales win.

vonneumannstan · 2026-04-03T13:32:28 1775223148

Disagree, there are a lot of reasons to use open source local LLMs that aren't related to free/libre/oss principles. Privacy being a major one.

ekianjo · 2026-04-03T15:22:33 1775229753

If you care about privacy making sure the closed source software does not call home is a concern...

the_lucifer · 2026-04-03T17:37:42 1775237862

I run Little Snitch[1] on my Mac, and I haven't seen LM Studio make any calls that I feel like it shouldn't be making.

Point it to a local models folder, and you can firewall the entire app if you feel like it.

Digressing, but the issue with open source software is that most OSS software don't understand UX. UX requires a strong hand and opinionated decision making on whether or not something belongs front-and-center and it's something that developers struggle with. The only counterexample I can think of is Blender and it's a rare exception and sadly not the norm.

LM Studio manages the backend well, hides its complexities and serves as a good front-end for downloading/managing models. Since I download the models to a shared common location, If I don't want to deal with the LM Studio UX, I then easily use the downloaded models with direct llama.cpp, llama-swap and mlx_lm calls.

[1]: https://obdev.at

vonneumannstan · 2026-03-30T15:52:09 1774885929

Pretty sure the entire markets for Storage, HBM, DDR5, etc are completely sold out for next several years. How is that saturated?