Hacker Newsnew | past | comments | ask | show | jobs | submit | more zozbot234's commentslogin

Obligatory reminder that today's so called "AGI" has trouble figuring out whether I should walk or drive to the car wash in order to get my dirty car washed. It has to think through the scenario step by step, whereas any human can instantly grok the right answer.

The idea/hope is that a video model would answer the car wash problem correctly. There are exactly the kinds of issues you have to solve to avoid teleporting objects around in a video, so whenever we manage more than a couple seconds of coherent video we will have something that understands the real world much better than text-based models. Then we "just" have to somehow make a combined model that has this kind of understanding and can write text and make tool calls

Yes, this is kind of like Tesla promising full self driving in 2016


I just don't know how to engage with these criticisms anymore. Do you not see how increasingly convoluted the "simple question LLMs can't answer" bar has gotten since 2022? Do the human beings you know not have occasional brain farts where they recommend dumb things that don't make much sense?

> Do the human beings you know not have occasional brain farts where they recommend dumb things that don't make much sense?

I completely agree. I'm ashamed to admit, I've actually walked to the car wash without my car on more than one occasion. We all make mistakes!


> Do the human beings you know not have occasional brain farts where they recommend dumb things that don't make much sense?

Not that dumb, no. That's why it's laughable to claim that LLMs are intelligent.


I should note for epistemic honesty that I expected I would be able to come up with an example of a mistake I made recently that was clearly equally dumb, and now I don't have a response to offer because I can't actually come up with that example.

What are you talking about? OpenAI's ChatGPT free tier (that everyone uses) answers this in the first sentence within a couple seconds.

"If your goal is to get your dirty car washed… you should probably drive it to the car wash "


That problem went viral weeks ago so is no longer a valid test. At the time it was consistently tripping up all the SOTA models at least 50% of the time (you also have to use a sample > 1 given huge variation from even the exact same wording for each attempt).

The large hosted model providers always "fix" these issues as best as they can after they become popular. It's a consistent pattern repeated many times now, benefitting from this exact scenario seemingly "debunking" it well after the fact. Often the original behavior can be replicated after finding sufficient distance of modified wording/numbers/etc from the original prompt.


For example, I just asked ChatGPT "The boat wash is 50 meters down the street. Should I drive, sail, or walk there to get my yacht detailed?" and it recommended walking. I'm sure with a tiny bit more effort, OpenAI could patch it to the point where it's a lot harder to confuse with this specific flavor of problem, but it doesn't alter the overall shape.

This question is obviously ambiguous. The context here on HN includes "questions LLMs are stupid about, I mention boat wash, clearly you should take the boat to the boat wash."

But this question posed to humans is plenty ambiguous because it doesn't specify whether you need to get to the boat or not, and whether or not the boat is at the wash already. ChatGPT Free Tier handles the ambiguity, note the finishing remark:

"If the boat wash is 50 meters down the street…

Drive? By the time you start the engine, you’re already there.

Sail? Unless there’s a canal running down your street, that’s going to be a very short and very awkward voyage.

Walk? You’ll be there in about 40 seconds.

The obvious winner is walk — unless this is a trick question and your yacht is currently parked in your living room.

If your yacht is already in the water and the wash is dock-accessible, then you’d idle it over. But if you’re just going there to arrange detailing, definitely walk."


You can make the argument that the boat variant is ambiguous (but a stretch), it's really not relevant since the point was revealing the underlying failure mode is unchanged, just concealed now.

The original car question is not ambiguous at all. And the specific responses to the car question weren't even concerned with ambiguity at all, the logic was borderline LLM psychosis in some examples like you'd see in GPT 3.5 but papered over by the well-spoken "intelligence" of a modern SOTA model.


I don't understand what occasional hiccups prove. The models can pass college acceptance tests in advanced educational topics better than 99% of the human population, and because they occasionally have a shortcoming, it means they're worse than humans somehow? Those edge cases are quickly going from 1% -> 0.01% too...

"any human can instantly grok the right answer."

When asking a human about general world knowledge, they don't have the generality to give good answers for 90% of it. Even very basic questions humans like this, humans will trip up on many many more than the frontier LLMs.


Even the rise of high-level languages did not lead to a "developer-less future". What it did was improve productivity and make software cheaper by orders of magnitude; but compiler vendors did not benefit all that much from the shift.

A high-level language or a compiler wasn't automating end-to-end reasoning for a programming task.

SAE level 2 driver assistance is explicitly not autonomous driving.

They've previously defined AGI as an AI that can directly create $100B in economic value.

Hmm interesting, thanks. I wonder how much value it's already created.

That number is probably negative

OTOH, out of order/speculative execution only amounts to information disclosure. And general purpose OS's (without mandatory access control or multilevel security, which are of mere academic interest) were never designed to protect against that.

A far greater problem is that until very recently, practical memory safety required the use of inefficient GC. Even a largely memory-safe language like Rust actually requires runtime memory protection unless stack depth requirements can be fully determined at compile time (which they generally can't, especially if separately-provided program modules are involved).


> and clean up any LLM-ness afterwards

That never happens. It's actually easier to write the code from scratch and avoid LLMness altogether.


[flagged]


There are lots of tools that aren’t worthwhile to learn to use, and in particularly learning to use poor quality output of subpar tools is not something I’m interested in learning.

The skill of cleaning up LLM-written slop to bring it to the human-like quality that any sane FLOSS maintainer would demand to begin with? It's just not worth it.

AOSP patched kernels still include some features that are not in the mainline version. The LineageOS folks are working on support for mainline kernels, but AIUI it's not there yet.

HBM is not normal memory. It uses a lot more area per bit and has lower yield too. So a Gb of baseline DRAM and a Gb of HBM are very different measurements, the latter equates to so much more in terms of volume.

Powering down unused physical RAM is absolutely a thing on some systems. For one thing, it's required if you ever want to support physical memory hotplug. The real issue however is that the gain from not doing DRAM refresh is clearly negligible: it's no more than the difference between putting a computer to sleep (ACPI S3), or putting a phone to sleep in airplane mode - and powering it off.

And you're saying Apple is doing that on the iPhone?

50% is nothing when RAM is up 500% or so.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: