Sorry, but that doesn't sound very reasonable at all. I'm also not sure what you...

hackinthebochs · on July 20, 2020

But why think performing arithmetic with 100% accuracy is required? Children learning arithmetic aren't perfectly accurate but they're certainly learning arithmetic. The fact that there is a digit cut off where the quality of its results drop off isn't all that surprising either. How much arithmetic can you do in your head? I'm likely to fail at some point with two digit addition without using a pencil or paper. Three digits I would be significantly worse. Your criteria for what counts as "learning arithmetic" doesn't seem to be based on anything substantive.

The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations. That is, it can't reprocess and refine a tentative answer to improve it. You can't do arbitrary arithmetic with a finite amount of substrate without this sort of recursion or recurrency. The fact that it can only do two digits with 100% accuracy could be a hardware or architecture limitation.

YeGoblynQueenne · on July 20, 2020

>> But why think performing arithmetic with 100% accuracy is required?

Because otherwise, how do you know that your system has learned the "rules of arithmetic", as per your comment, and not something completely different? And like I say in my other comments, there's a very obvious alternative about what that something completely different could be: a representation of already seen results.

Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules. If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic. Pocket calculators with tiny resources can do that and they can do it with very long sequences of numbers, so why would a huge language model, running on very expensive hardware, fail?

>> The cliff for GPT-3's arithmetic ability is likely due to the fact that it can't do recursive/recurrent calculations.

Well, yes, exactly that. If a system can't represent recursion then it can't represent arithmetic between arbitrary numbers. Hell, without recursion, a system can't even count to arbitrary numbers. So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?

Actually, your observation about recursion is the first thing I'd have normally said, but it doesn't seem to be commonly understood that neural networks (and propositional, attribute-value, learners in general) can not represent recursion. Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable. But that's a big part of why they can't build general theories. In terms of arithmetic, it means they can't represent the relation x + y = z because they can't represent x, y and z as universally quantified variables. The only remaining alternative is to represent every ground expression, like 1 + 1 = 2, 1 + 2 = 3, etc. But that's not the rules of arithmetic! That's only some instances of specific operations. That is why GPT-3 hasn't learned arithmetic and can't learn arithmetic, no matter how much data it is fed. It's just not possible to represent the rules of arithmetic in a propositional language. A first-oder language and the ability to define relations recursively are necessary.

Edit: OK, sorry, my claim about a first order language being necessary is maybe hard to substantiate outside of Peano arithmetic. But, recursion and the ability to represent variables are absolutely necessary. See primitive recursive functions: https://en.wikipedia.org/wiki/Primitive_recursive_function.

hackinthebochs · on July 21, 2020

>Because otherwise, how do you know that your system has learned the "rules of arithmetic", as per your comment, and not something completely different?

Presumably because it answers correctly for examples it hasn't explicitly seen in training. While its plausible that it has seen all two-digit sums in during the course of training, its not a given.

>Besides, GPT-3 is a piece of software, it's not a child or a grown up human, who can make mistakes because their memory fails or because they get overwhelmed by the complexity of executing a complex set of rules.

GPT-3 can become "overwhelmed" by the complexity of the problem extending beyond its feed-forward computation window.

>If a piece of software implements a set of rules, it's usually able to execute them right every time, without failure, certainly so for relatively simple rules like arithmetic.

But a computer system that "computes" through manipulations of language representations is fundamentally different than computer systems that came before. Carrying over the intuition from computers as bit-manipulators to manipulators of language representations is a mistake.

> so why would a huge language model, running on very expensive hardware, fail?

Impedance mismatch? It turns out performing tasks on a computational substrate not suited to those tasks comes with severe drawbacks. But we already knew that.

>So in what sense can GPT-3 be said to have "learned the rules of arithmetic"? Learned them, how, if it can't represent them?

It could know how to sum individual digits through memorization and learn the carry rule. It may be incapable of recursion and thus incapable of summing arbitrarily long digits. But learning the carry rule is most of the way there.

>Similarly, such systems can't represent non-ground values, that is they can't represent the concept of a variable.

I see no reason to accept this. Multi-layer networks seem to be well-suited for abstract representations and manipulations on non-ground values. Ground-values are the input into the network, but higher layers represent on the abstract properties of the ground-values within its receptive field, rather than the particulars of the ground-values. For example, the location and direction of an edge rather than the particular in the form of an edge.

YeGoblynQueenne · on July 21, 2020

>> I see no reason to accept this.

Yes, I'm aware it's very difficult to get people to believe this outside of AI research. Of course, it is entirely uncontroversial and very well understood by researchers. For example, I was in a presentation by a gentleman who works at DeepMind last year and who works on neuro-symbolic integration and he was asked a question along the lines of "how can you model first order logic without variables?" and he pointed out that he had a footnote on one of his slides where he was noting this limitation and that work was underway to address it.

Regarding arithmetic, none of the points made in your comment made in the GPT-3 paper. In fact, the paper makes no attempt to explain what makes GPT-3 capable of performing arithmetic, other than to say that the mistakes in carrying a one suggest that it's actually trying to perform computation and failing. So I have to ask, where do these points come from?

What I mean is, you seem to have a theory about how GPT-3 works. Where does it come from? I apologise if this comes across as personal or unfair, but many commenters in this thread and similar conversations express strong opinions and give detailed explanations about how GPT-3 and similar models work. I am always left wondering where all this information comes from, given that usually it can't be found in the sources I'd expect to find it, namely the work that is being discussed (namely, the GPT-3 paper, in this case).

hackinthebochs · on July 21, 2020

>For example, I was in a presentation by a gentleman who works at DeepMind last year and who works on neuro-symbolic integration

Sure, neural networks don't operate on proper variables and so in the context of neuro-symbolic processing I'm sure this is a significant hurdle. But in general, abstract representations is part-and-parcel of what makes deep learning powerful. And such an abstract representation is all that's needed for a neural arithmetic unit.

Here[1] is a study on GPT-2 that demonstrates its middle layers develop a representation of syntax and part-of-speech, the sorts of abstract representations that would be needed to develop a mechanism to do abstract arithmetic.

>What I mean is, you seem to have a theory about how GPT-3 works. Where does it come from?

Studies like the one mentioned, and reasonable extrapolation from knowledge of DL and other transformer architectures. We are not totally ignorant on how GPT-3 works.

[1] https://aletheap.github.io/posts/2020/07/looking-for-grammar...

YeGoblynQueenne · on July 21, 2020

My comment above discussed the inability of neural networks (and propositional, attribute-value learners in general) to represent variables. I'm sorry, but I can't see how your comment or the post you link to show that neural networks can represent variables.

I do not quite understand the relation between "abstract representations that would be needed to develop a mechanism to do abstract arithmetic" and variables. I'm also not sure what you mean by "abstract arithmetic", or what mechanisms you mean. Can you please explain?

Also, I had thought we shared an understanding that the ability to represent primitive recursive functions (which presupose the ability to represent variables and recursion) is necessary to represent arithmetic. Your above comment now makes me doubt this, also. Can you clarify?

Finally, the link above is a blog post. I wouldn't call it a study. But, can you say where in that post I can find the theory about GPT-3's function that you express above?