More

p-e-w · 2026-05-06T11:52:36 1778068356

Whoa, if that is even remotely accurate then the talk about agents is a complete red herring.

theolivenbaum · 2026-05-06T12:03:12 1778068992

If I remember correctly the status page was not precise before the acquisition - so take with a big grain of salt the 100% pre-acquisition values

prepend · 2026-05-06T15:11:48 1778080308

I remember the status page being quite accurate before the acquisition.

I don’t like this whole casting of doubt upon sources without providing superior or even alternate sources.

It makes it hard to discuss when one person presents a source and another says “I’m not sure that is accurate.” In a vague way.

What am I supposed to do with that? Research more sources that may or may not align with how you feel?

p-e-w · 2026-05-06T02:44:04 1778035444

Yup. That post is a typical example, symptomatic of modern technology culture, of calling for humans to change their nature in response to technology.

This is a fundamental mistake. It’s always the job of technology (indeed, its most important job) to work within the constraints of human nature, not the other way round. Being unable to do that is the defining characteristic of bad technology.

p-e-w · 2026-05-05T01:56:56 1777946216

Not to mention that manually writing code is itself a process of understanding. It cannot be replicated by reading code, no matter how carefully.

p-e-w · 2026-05-04T05:51:38 1777873898

> due to fundamental limitations

People keep throwing this phrase around in relation to LLMs, when not a single “fundamental limitation” has been rigorously demonstrated to exist, and many tasks that were claimed to be impossible for LLMs two years ago supposedly due to “fundamental limitations” (e.g. character counting or phonetics) are non-issues for them today even without tools.

aDyslecticCrow · 2026-05-04T13:30:36 1777901436

> character counting

The models now whaste a vast amount of useless neurons memorising the character count the entire English language so that people can ask how many r's are in strawberry and check a tickbox in a benchmark.

The architecture cannot efficiently or consistently represent counting letters in words. We should never have forced trained them to do it.

This goes for other more important "skills" that are unsuited to tranformer models.

Most models can now do decent arithmetics. But if you knew how it has encoded that ability in its neurons then you would never ever ever ever trust any arithmetic it ever outputs, even in seems to "know" it (unless it called a calculator MCP to achieve it).

There are fundamental limitations, but we're currently brute forcing ourselves through problems we could trivially solve with a different tool.

p-e-w · 2026-05-04T14:15:23 1777904123

> The models now whaste a vast amount of useless neurons memorising the character count the entire English language

No they don’t. They only need to know the character count for each token, and with typical vocabularies having around 250k entries, that’s an insignificant number for all but the tiniest LLMs.

aDyslecticCrow · 2026-05-04T17:15:42 1777914942

In a very simplified view;

Those "tolkens" humans "count" are translated to a ~2048 (depends on model) floating point vector.

bird => {mamal, english, noun, Vertebrate, aviant} has one r but what if you make it 20% more "french". Is is still 1 r? That could be the word "bird" in french, or it could be a french speaking bird or a bird species common in france.

If nearest neibour distance to the vocabulary of every language makes the vector no longer map to "bird"; then the amount of rs' must change, using a series of trained conditional checks (with some efficiency where languages have some general spelling patterns).

That is such an unreasonable amount of compute, that it is likley faar cheaper, easier and more reliable to train the model to memorise the output:

{"MCP":"python", "content":"len((c for c in 'strawberry' if c='r'))"}

The attention mechanism allow LLMs to learn this kind of absurdly inefficient calculations. But we really shouldn't use LLMs where they're outperformed by trivial existing solutions.

topham · 2026-05-04T14:32:39 1777905159

Nope. Tokens aren't what you think they are.

coldtea · 2026-05-04T07:52:58 1777881178

>People keep throwing this phrase around in relation to LLMs, when not a single “fundamental limitation” has been rigorously demonstrated to exist

Some limitations are not rigorously demonstrated to be fundamental, but continuously present from the first early LLMs yes. Shouldn't the burden of proof be on those who say it can be done?

And some limitations are fundamental, and have been rigorously demonstrated, e.g.:

https://arxiv.org/abs/2401.11817?utm_source=chatgpt.com

p-e-w · 2026-05-04T08:32:33 1777883553

That paper’s abstract doesn’t carry its title, to put it mildly.

coldtea · 2026-05-04T10:10:25 1777889425

What part of "Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers. " doesn't carry the title, to ask mildly?

red75prime · 2026-05-04T16:21:18 1777911678

As with all the works that use too broad a definition of an LLM they prove too much. This work defines an "LLM" as a computable function obtained by applying a finite number of steps of a generic algorithm to an initial computable function.

What they really prove is that it's impossible to extrapolate unconstrained non-continuous function from a finite subset of its values. Good for them, I guess.

It's like saying that the no free lunch theorems proves that LLMs can't be the best optimizers, while it proves (roughly) that the best optimizers don't exists. That is, even people aren't the best optimizers, but we manage somehow, so LLMs can too.

p-e-w · 2026-05-04T11:50:12 1777895412

I don’t agree with that definition of “hallucination”, for starters.

MarkusQ · 2026-05-04T15:46:55 1777909615

So substitute another phrase, if you prefer. It doesn't change the logic.

"Specifically, we define a formal world where bungling is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably bungle if used as general problem solvers."

red75prime · 2026-05-04T16:37:23 1777912643

Their diagonalization argument applies to any system that uses finite training data. Calling such a system "LLM" is an (unintentional) red herring.

MarkusQ · 2026-05-04T17:46:27 1777916787

Yeah. IMHO this is the more serious objection.

dijit · 2026-05-04T06:06:55 1777874815

Character counting remains a huge issue without tools.

Are you using only frontier models that are gated behind openai/anthropic/google APIs? Those use tools to help them out behind the scenes. It remains no less impressive, but I think we should be clear.

girvo · 2026-05-04T09:32:31 1777887151

The literal best public models still fail to count characters consistently in practice so I’m not sure what you mean. It’s literally a problem we’re still trying to solve at work

outofpaper · 2026-05-04T10:33:53 1777890833

What's amazing is that they even can fairly reliably appear to count characters. I mean we're talking about systems that infer sequences not character counters or calculators. They are amazing in unrelated ways and we need to accept this so we can use them effectively.

jameshart · 2026-05-04T13:21:45 1777900905

I suspect character counting - counting small numbers in general in fact - is something that multimodal models will gradually learn through their visual capabilities. We have generative systems that are capable of generating an image of the word ‘strawberry’, and of counting how many strawberries are visible in an image; seems likely it’s possible for an LLM to ‘imagine’ what the word strawberry looks like and count the ‘Rs’ it can ‘see’.

girvo · 2026-05-04T11:33:25 1777894405

Of course, they’re shockingly powerful, just in an incredibly “spiky” way

3form · 2026-05-04T10:15:05 1777889705

Your comment, after removing the particulars, has a shape of:

People have an <opinion> which hasn't been rigorously proven, while <not rigorously proven counteropinion>.

As such, I am not sure what you're trying to achieve here.

3form · 2026-05-04T09:20:24 1777886424

Is character counting actually not an issue anymore? Do you know somewhere where I can read more about this?

Marazan · 2026-05-04T08:24:11 1777883051

If you remove the auxiliary tools and just leave the core LLM then strawberry still has an undefined number of `r`s in it.

p-e-w · 2026-05-04T08:35:38 1777883738

That’s false. Larger LLMs learn token decompositions through their training, and in fact modern training pipelines are designed to occasionally produce uncommon tokenizations (including splitting words into individual characters) for this reason. Frontier models have no trouble spelling words even without tools. Even many mid-sized models can do that.

kilpikaarna · 2026-05-04T09:11:29 1777885889

Wait, where can I learn more about this? I don't doubt that varying the tokenization during training improves results, but how does/would that enable token introspection?

p-e-w · 2026-05-04T11:44:31 1777895071

Because LLMs can learn that different token sequences represent the same character sequence from training context. Just like they learn much more complex patterns from context.

You can try this out locally with any mid-sized current-gen LLM. You’ll find that it can spell out most atomic tokens from its input just fine. It simply learned to do so.

mrob · 2026-05-04T09:48:35 1777888115

Character counting errors are a side effect of tokenization, which is a performance optimization. If we scaled the hardware big enough we could train on raw bytes and avoid it.

teiferer · 2026-05-04T11:56:19 1777895779

No, tokenization is not the only reason. A next-word predictor has fundamentally a hard time executing algorithms, even as simple as counting.

mitthrowaway2 · 2026-05-04T14:22:27 1777904547

Counting is one of the algorithms that can be expressed by a RASP program, which transformers closely approximate.

MarkusQ · 2026-05-04T15:50:09 1777909809

Close famously counts in horseshoes and hand grenades. Algorithms, just as famously, are a domain where off-by-one is still wrong.

danpalmer · 2026-05-04T06:58:07 1777877887

This is kind of my point, we need to get better at describing the limitations and study them. It seems extremely clear that there are limitations, and not just temporary ones, but structural limitations that existed at the beginning and continue to persist.

ijidak · 2026-05-04T09:44:26 1777887866

Yeah I think it was the word "fundamental" he took issue with.

rimliu · 2026-05-04T06:31:35 1777876295

of course, if you choose to ignore all the limitations they indeed have no limitations.

mkbosmans · 2026-05-04T06:38:13 1777876693

Nobody says they have no limitations. The question is are those limitation fundamental, i.e. can we expect improvement, say within a year.

danpalmer · 2026-05-04T06:59:52 1777877992

When I talk about fundamental limitations, I mean limitations that can't be solved, even if they could be improved.

We have improved hallucinations significantly, and yet it seems clear that they are inherent to the technology and so will always exist to some extent.

p-e-w · 2026-05-04T08:31:12 1777883472

“Seems clear” based on what?

pegasus · 2026-05-04T08:59:00 1777885140

For one, based on continuously frustrated hopes (and promises!) that hallucinations will go away.

coldtea · 2026-05-04T07:55:08 1777881308

As a general architecture, an LLM also has limitations that can't be improved unless we switch to another, fundamentally different AI design that's non LLM based.

There are also limitations due to maths and/or physics that aren't fixable under any design. Outside science fiction, there is no technology whose limitations are all fixable.

Here's one: https://arxiv.org/abs/2401.11817?utm_source=chatgpt.com

ToValueFunfetti · 2026-05-04T13:47:38 1777902458

Am I misreading that paper? They define hallucinations as anything other than the correct answer and prove that there are infinitely many questions an LLM can't answer correctly, but that's true of any architecture- there are infinitely many problems a team of geniuses with supercomputers can't answer. If an LLM can be made to reliably say "I don't know" when it doesn't, hallucinations are solved- they contend that this doesn't matter because you can keep drawing from your pile of infinite unanswerable questions and the LLM will either never answer or will make something up. Seems like a technically true result that isn't usefully true.

raincole · 2026-05-04T09:46:44 1777888004

Drawing five fingered humans was a fundamental limitation... until it's not.

p-e-w · 2026-05-03T00:58:56 1777769936

I appreciate you acknowledging that this was a mistake, but as you surely know from your own experience with other people’s mistakes, some mistakes are so egregious that they cast doubt on the intentions of the people involved even if they are corrected later.

To me, “let’s add false attribution to every commit by default without informing the user” falls squarely into that category. I don’t think I’ve ever worked in an environment where something like that wouldn’t have been red-flagged in three seconds by anyone who took even a casual glance. I’d honestly be embarrassed if such a proposal even made it into a public pull request for my organization, nevermind that pull request getting merged.

dmitriv · 2026-05-03T06:44:17 1777790657

If what you described would make it to our PR queue, it would definitely not pass the gates.

The idea was to track AI-only changes and add the trailer when such changes were detected AND the setting was enabled. Obviously, we didn't want to attribute all changes to AI. There is a bug in change detection (which slipped through testing), which led to even non-AI changes being tracked. And thus we have this problem.

The PR linked here wasn't even implementing the feature, it was changing the default for the setting.

detaro · 2026-05-03T09:39:19 1777801159

> (which slipped through testing)

In another comment you say you caught it in testing and didn't think it needed fixing, which is it?

fastasucan · 2026-05-04T05:45:35 1777873535

>If what you described would make it to our PR queue, it would definitely not pass the gates

It just did though. Did you approve the PR without actually looking at the code?

p-e-w · 2026-05-02T00:25:05 1777681505

Because the human world is built for bipedal beings and everything else will encounter obstacles somewhere.

Mars008 · 2026-05-02T00:40:02 1777682402

Dogs and cats don't complain. Gorillas and chimpanzee should be fine too.

Bipedal robots suck right now, but superhuman stability is achievable in near future.

AngryData · 2026-05-02T05:28:05 1777699685

Yeah, I think 4+ legged bots should be more common than 2 leg variants. 2 legs is neat, but takes far more work and processing to control and balanced. It also requires much more powerful legs, a spider bot has more legs which makes it more "complex" in some ways, but individual legs don't need to hold and maneuver its entire body weight alone and it can hold 3 points of ground contact at all times, even when moving around, making it exceptionally stable. A bipedal robot has to be able to hold like twice its own body weight or more in order to balance and maneuver on a single foot in order to walk around and navigate obstacles.

p-e-w · 2026-05-02T00:22:36 1777681356

> We'll know this works when it starts replacing Amazon pickers in quantity.

That doesn’t follow. There are plenty of tasks that can be fully and reliably automated but aren’t, for the simple reason that human labor is dirt cheap compared to advanced robotics.

momojo · 2026-05-02T06:11:19 1777702279

I disagreed, then re-read your post, then re-read the OP, and now I've come full circle to apologize; I think you make a fair point.

I work at a biotech. We spent who knows how much time and money trying to develop a 'lab technician bot' to automate one of our critical assays. Turns out, a 6-figure machine still isn't as economical as my coworker Y, one of the veteran lab-technicians. Sure she takes the occasional sick day but even at our volume (and we do industrial-level, multiple clients batched into a single assay pass) it won't be economical to replace her for a very long time (if we even reach that scale).

robertlagrant · 2026-05-02T11:11:24 1777720284

Absolutely. I worked at a gene sequencing company and I led the software side of making a robotic product[0] to automate the 20-30 minutes of sample preparation time. It's great for lots of uses, but for anything outside the exact thing it automates, it doesn't cover it. For that you need an expert human.

[0] https://web.archive.org/web/20250919140427/https://nanoporet...

momojo · 2026-05-02T18:03:59 1777745039

Thanks for the story! Was this company similar to Plasmidsaurus in terms of industrial-scale sequencing?

robertlagrant · 2026-05-05T10:54:01 1777978441

No problem! It was probably the most fun product I've ever had the pleasure of leading the software dev of.

The company is one of the few in the world that makes gene sequencing technology - actual chemistry, biologics, protocols, hardware and software. Plasmidsaurus is a customer[0] - they use our devices and have built an incredibly successful service on top of them!

[0] https://nanoporetech.com/blog/plasmidsaurus-redefine-the-gol...

michaelnovati · 2026-05-02T19:14:41 1777749281

Cool!!

somewhatgoated · 2026-05-02T00:49:25 1777682965

What is the point of humanoid “general” robots then? We already have pretty reliable ways to make and train humans. Humans are cheaper and better than robots. I could imagine robots for some specialised tasks where you don’t want to use a human for eg security reasons, but you don’t need general purpose robots for that

semi-extrinsic · 2026-05-02T06:53:04 1777704784

Robots are good at things that are "simple" but where human precision is not good enough, or where people would get bored and make mistakes.

nemomarx · 2026-05-02T01:57:42 1777687062

If robots ever do get cheaper than humans for it, though?

shermantanktop · 2026-05-02T02:57:29 1777690649

In natural ecosystems, nobody beats the apex predator directly, and nobody beats the hyperspecialized niche critter at their own game. The new species has some advantage that’s different than what is there.

If a humanoid robot is slower dumber human that is expensive, requires power, can’t get wet, falls over, and doesn’t understand stairs. Is not sleeping and being radiation tolerant enough of an advantage to be worth it?

pear01 · 2026-05-02T03:06:48 1777691208

You forgot a big one in your description of the hypothetical advantages:

No free will

Dylan16807 · 2026-05-02T04:05:13 1777694713

The nature comparison doesn't work on a fundamental level because you're only getting a fraction of the human's power based on how much they're happy to sell.

imtringued · 2026-05-02T08:54:55 1777712095

They already are, the problem with humanoid robots is that people think that adding legs to the robot will somehow fundamentally make it more intelligent.

People see a robot arm attached to a stationary platform and understand it requires integration work to perform a single task.

But when those same people see a humanoid robot, they think they can just talk to it like a real human and it will do what you told it to do.

They don't think about the fact that the humanoid robot has to be programmed exactly the same way the stationary robot arm has to be programmed and that programming the legs in addition to the arms is a much more challenging problem.

jamesrcole · 2026-05-02T08:50:51 1777711851

technology gets cheaper over time. If they were always going to cost the amount they do now, you might have a point. But they'll eventually get much cheaper.

bobthepanda · 2026-05-02T03:40:34 1777693234

Robots can be optimized for tasks and if they are, their benefits are greater. When cars replaced the horse, it was because they didn’t poop, and because a car designed only for transport would not suddenly have a heart attack and stop working.

tomtomistaken · 2026-05-02T06:07:10 1777702030

Funnily enough, cars have their own way of pooping and dying of a heart attack.

bobthepanda · 2026-05-02T16:59:06 1777741146

It’s far less frequent, is at least recoverable, and of course there’s no immediate public sanitation issue the way poop and dead horses attract flies and disease.

tintor · 2026-05-02T06:53:40 1777704820

Cars can stop working suddenly in many many ways, for many reasons.

bobthepanda · 2026-05-02T16:58:07 1777741087

But at far less frequency and severity than a temperamental horse.

In Manhattan in 1900, 400 horses would die a day, and a rotting horse carcass is a far bigger sanitation problem than a broken down car, which you can tow and fix up.

chihuahua · 2026-05-02T01:15:53 1777684553

A friend who works at Amazon made the same point: "We don't really need robots in the FCs urgently [other than the Kivas], because it turns out you can just pay people $17/hour"

Animats · 2026-05-02T03:18:59 1777691939

Mechanical picking has been too slow. It's not a problem with the robot mechanics. Here's 300 picks/minute from 2012.[1] The parts are all the same, so the vision problem is simple.

But picking arbitrary objects from fulfillment bins is still running at a few picks per minute.[2] As the speed picks up, humans become less necessary.

[1] https://www.youtube.com/watch?v=6RKXVefE98w

[2] https://www.youtube.com/watch?v=2X4CU3jmw-g

nine_k · 2026-05-02T01:45:16 1777686316

That's the point of the test condition. When running a robot becomes more economical than paying full-scale humans $17/h, something important about robot abilities will have changed.

WillAdams · 2026-05-02T02:51:33 1777690293

I dunno, I worked in an Amazon Warehouse for a year part-time (and a couple of weeks full-time when in-between jobs) --- on one occasion, I pulled up to a bin full of non-descript cardboard boxes near where a group of trainees were working their way through, grabbed one box, spun it around for the six-sided box check, scanned it, confirming it was the right one, and before I could move on to my next pick, a trainee asked, "How did you know that was the right box?", which required a several minute explanation of how the item description and the slight differentiations of the boxes led to that conclusion.

The big win would be training the folks doing stowing to not create such situations and to put markedly different things in each rainbow bin.

pear01 · 2026-05-02T03:18:44 1777691924

This would be a more convincing take if reasoning LLMs didn't already exist. Given the growth in capability over the last few years alone nothing about your description "several minute explanation of how the item description and the slight differentiations of the boxes" seems beyond an artificial intelligence to solve by the time humanoid robots would be ready to physically traverse a warehouse.

Your last point is also interesting given perhaps a robot is more amenable to such instruction, thus creating cascading savings. Each human has to be trained, and could be individually a failure. Robot can essentially copy its "brain" to its others.

Or likely more accurately, download the latest brain trained from all the robot's aggregate experiences from the amazon hivemind hq

ghshephard · 2026-05-02T05:22:49 1777699369

The "Markedly Different things" in each bin was a big Amazon Warehouse advance in warehousing. Traditionally - things that were "alike" were put on shelves/bins - but (according to Amazon) it was far more efficient for pickers (at least back in the day - may have changed since then) to have random things on shelves located near each other to allow for equal access to popular items by pickers.

WillAdams · 2026-05-02T16:10:31 1777738231

Yeah, my emphasis should have been on _successfully_ training the folks doing stowing, in such a way that they actually work that way.

gizajob · 2026-05-02T06:25:24 1777703124

I was thinking this week that AI token costs are probably going to get so expensive soon that bright spark CEOs are going to realise “why am I paying for such expensive coding agents when I can pay people from the third world to code!?!” and announce outsourcing like it’s some kind of stunning and innovative revelation.

throwaway2037 · 2026-05-02T07:05:40 1777705540

    > when I can pay people from the third world

C-suite has been saying this for 30+ years. They never tire of it. Ask yourself: At this point in time, why aren't all programmers working from low cost jurisdictions?

gizajob · 2026-05-02T07:37:47 1777707467

I think you didn’t grok the hidden punchline - this is the stage after they’ve replaced all their third world coders with AI agents, until one day a C-suiter has the revelation that humans are cheaper and better, and the company then starts toting its humanistic credentials all over LinkedIn.

krisoft · 2026-05-02T09:30:19 1777714219

But that does follow. The economics working is not some outside factor. If the robot “could do the task” but would cost more than paying a human to do the same task then the robot “does not work”. It is frequently because the robot would be too slow, or not reliable enough, or could only handle certain types of items. But ultimately all of these boil down to cost.

We have seen lab demoes of robotic manipulation for decades. The reason why they stay in the lab (when they do) and don’t become ubiquitous is because they are not good enough. In other words they don’t work. The economics and “does it work” is not two separate concerns but one and the same.

decimalenough · 2026-05-02T10:18:48 1777717128

It's a continuum, not binary. The same robot that doesn't financially "work" for replacing a manual scavenger sorting garbage in an African slum might be quite cost-effective sorting recycling in Switzerland, and would likely have a niche regardless of price if used to (say) sort biohazardous or radioactive materials. And there are already millions of robots out there assembling cars etc.

p-e-w · 2026-04-29T13:00:25 1777467625

Then the correct answer is “I can’t tell.”

Not “Here’s a random guess that I just pulled out of my ass.”

LLMs have picked up the bad habit of trying to give an answer when no answer can be given from scientists, who overall don’t say “I don’t know” nearly as often as they should.

jeroenhd · 2026-04-29T13:20:06 1777468806

I tried asking LLMs about food before. They all say "I can't tell for certain, but this is an estimate based on the ingredients I can spot/infer/guess".

You need to write a specific prompt to avoid any warnings.

Of course a lot of people don't know what limitations LLMs have, so there's some value to a blog post about it, but it's not as black-and-white as the article might suggest with its graphs.

The prompt (documented here: https://www.diabettech.com/wp-content/uploads/2026/04/Supple...) lists specific instructions and a specific output format that doesn't allow the LLM any room for explanation or warning in processable data (only in notes fields). In fact, the prompt explicitly tells the LLM to ignore visual inferencing for some statistics and to rely on a nutrition authority instead.

Even in that intentionally restricted format, the English language output uses words like "roughly" and "estimated" in the LLMs I've tested.

Sure, if you take the numeric values and plot them in graphs, you get wildly inconsistent results, but that research method intentionally restricts the usefulness and reliability of the LLMs being researched.

What's much more troubling is this line from the preprint:

> The open-source iAPS automated insulin delivery (AID) system now offers food analysis through APIs from OpenAI, Anthropic and Google [8]

The linked app does seem to have a disclaimer, though:

> "AI nutritional estimates are approximations only. Always consult with your healthcare provider for medical decisions. Verify nutritional information whenever possible. Use at your own risk."

Ukv · 2026-04-29T13:27:43 1777469263

> Then the correct answer is “I can’t tell.”

From the paper they're using structured JSON schema mode opposed to freeform answers, so it can't. Models do typically caveat their answer for questions like this, in my experience.

professoretc · 2026-04-29T13:47:58 1777470478

They'll qualify their answers in English but as the article mentions, if your prompt asks for a confidence score, that "uncertainty" doesn't translate into low numerical confidence.

Ukv · 2026-04-29T14:31:08 1777473068

Quantifying their own confidence is also something they're not good at, and which the format would prevent them from refusing to do or preceding with a caveat if that's what you'd want of them. Particularly since the response format seems backwards - giving confidence, then carbs estimate, then observations/notes, rather than being able to base carbs estimate off of observations/notes and then confidence estimate off of both of those.

> They'll qualify their answers in English but [...]

That the default user-facing chat as a normal user would use it gives a warning is the key part IMO. I don't think expectations of there being no "wrong way" to use the model can necessarily extend to API usage with long custom system prompt and restricted output format.

agentultra · 2026-04-29T13:02:58 1777467778

LLMs had no agency to choose such a course of action.

They’re algorithms and they were designed this way.

p-e-w · 2026-04-25T23:56:45 1777161405

Serious question: What exactly do they love America for? I just don’t get it. Seems like in every way that matters to the common people, the US is at best mediocre.

Could it be that they secretly subscribe to a different version of the same mythical exceptionalism as the president they despise?

DennisP · 2026-04-26T12:40:30 1777207230

People love their home sports teams, even when they're losing. They love their kids, even when they're getting mediocre grades in school. It's like that.

You're thinking of nationalism, which is when people think their country is the best one. Real patriotism is loving your country just because it's yours.

p-e-w · 2026-04-22T15:09:27 1776870567

Stuff like this is far above the capabilities of today’s top AIs.

It’ll produce something, sure, but it won’t actually work, and making it work takes as much effort as building it from scratch.