I've done several experiments (and posted results in previous HN comments) where...

faizshah · on April 12, 2023

I have run into the same issue when using it for coding. It can easily debug simple code but for libraries like Bazel I went down a rabbit hole for 2 hours of letting it debug an error and failing every time even with chain of thought it had a very shallow understanding of the issue. Eventually I had to debug it myself.

RheingoldRiver · on April 12, 2023

> For example, it once reconciled an inconsistency by saying that, yes, 2 * 2 = 4, but if you multiply both sides of that equation by a big number, that's no longer true.

Fair enough, have you explained it the axioms of arithmetic? It only has memorized examples that it has seen, it has a right to be skeptical until it's seen our axioms and proofs about what is always true in mathematics.

When I was a child I was skeptical that an odd number + an even number is always odd etc for very large numbers until I saw it proven to me by induction (when I was 6, I think, imo this was reasonable skepticism).

Now, ChatGPT probably has seen these proofs, to be fair, but it may not be connecting the dots well enough yet. I would expect this in a later version that has been specifically trained to understand math (by which I really mean math, and not just performing calculations. And, imagine what things will prove for us then!)

civilized · on April 12, 2023

I think GPT has read about as many textbooks on arithmetic as I have, and the difference between us is entirely in the intelligence to absorb the contents and apply them logically with consistent adherence to the rules.

I think one problem with these models is that all their knowledge is soft. They never learn true, universal rules. They seem to know the rules of grammar, but only because they stick to average-sounding text, and the average text is grammatical. At the edges of the distribution of what they've seen, where the data is thin, they have no rules for how to operate, and their facade of intelligence quickly falls apart.

People can reliably add numbers they've never seen before. The idea that it would matter whether the number has been seen before seems ridiculous and fundamentally off-track, doesn't it? But for GPT, it's a crapshoot, and it gets worse the farther it gets away from stuff it's seen before.

sharemywin · on April 12, 2023

in computer logic you would get an undefined if the number was large enough.

civilized · on April 12, 2023

It doesn't work with numbers as computer numbers though. It works with them as decimal digit strings, just like humans do.

Paul-Craft · on April 12, 2023

Make the number you multiply by essentially the concatenation of a long series of random digits, and I can just about guarantee most humans will get different things on both sides, because they'll make one or more mistakes doing the math. That is, of course, assuming the humans don't have suitable traditional computer tools capable of handling such a scenario.

civilized · on April 13, 2023

Not sure how this is relevant to the discussion.

Paul-Craft · on April 13, 2023

You don't see how asking humans to multiply both sides of 2 * 2 = 4 by the same, very large, random-ish number, and expecting that they'll get different things is relevant to this:

> 2 * 2 = 4, but if you multiply both sides of that equation by a big number, that's no longer true.

You know, the very same scenario I pulled from your comment?

civilized · on April 13, 2023

It's not the same issue. I was talking to GPT about the strings 2 * 2 * x and 4 * x, not the multiplied-out versions.

int_19h · on April 13, 2023

Was it GPT-3.5 or GPT-4?

civilized · on April 13, 2023

GPT-3.5. People keep telling me GPT-4 is so much better, but I don't know where I can access it for free and I'm not interested in paying for it.

But if anyone wants to give it to me for free, I would happily make a $1000 bet that I can get GPT-4 to make the same mistake.

int_19h · on April 13, 2023

There's no free tier that I know of. But, yes, it is drastically better, and it's specifically much less prone to hallucinate "proofs" that the previous answer is correct if you challenge it.

If you provide the inputs for some specific task where you expect GPT-4 to fail in this manner, I can give it a try.