Hacker Newsnew | past | comments | ask | show | jobs | submit | AlphaAndOmega0's commentslogin

Agreed. I find that particularly annoying, and I also seem to find that the spatial arrangement or stereo effect is muted for most instruments (or the model simply doesn't use that feature as well as a good human musician).

I'm a psychiatry resident who finds LLM research fascinating because of how strongly it reminds me of our efforts to understand the human brain/mind.

I dare say that in some ways, we understand LLMs better than humans, or at least the interpretability tools are now superior. Awkward place to be, but an interesting one.


LLMs are orders of magnitude simpler than brains, and we literally designed them from scratch. Also, we have full control over their operation and we can trace every signal.

Are you surprised we understand them better than brains?


We've been studying brains a lot longer. LLMs are grown, not built. The part that is designed are the low-level architecture - but what it builds from that is incomprehensible and unplanned.

It's not that much longer, really.

LLMs draw origins from, both n-gram language models (ca. 1990s) and neural networks and deep learning (ca. 2000). So we've only had really good ones maybe 6-8 years or so, but the roots of the study go back 30 years at least.

Psychiatry, psychology, and neurology on the other hand, are really only roughly 150 years old. Before that, there wasn't enough information about the human body to be able to study it, let alone the resources or biochemical knowledge necessary to be able to understand it or do much of anything with it.

So, sure, we've studied it longer. But only 5 times longer. And, I mean, we've studied language, geometry, and reasoning for literally thousands of years. Markov chains are like 120 years old, so older than computer science, and you need those to make an LLM.

And if you think we went down some dead-end directions with language models in the last 30 years, boy, have I got some bad news for you about how badly we botched psychiatry, psychology, and neurology!


Embedding „meaning“ in vector spaces goes back to 1950s structuralist linguistics and early information retrieval research, there is a nice overview in the draft for the 3rd edition of speech and language processing https://web.stanford.edu/~jurafsky/slp3/5.pdf

You are still talking about low level infrastructure. This is like studying neurons only from a cellular biology perspective and then trying to understand language acquisition in children. It is very clear from recent literature that the emergent structure and behavior of LLMs is absolutely a new research field.

"Designed" is a bit strong. We "literally" couldn't design programs to do the interesting things LLMs can do. So we gave a giant for loop a bunch of data and a bunch of parameterized math functions and just kept updating the parameters until we got something we liked.... even on the architecture (ie, what math functions) people are just trying stuff and seeing if it works.


> We "literally" couldn't design programs to do the interesting things LLMs can do.

That's a bit of an overstatement.

The entire field of ML is aimed at problems where deterministic code would work just fine, but the amount of cases it would need to cover is too large to be practical (note, this has nothing to do with the impossibility of its design) AND there's a sufficient corpus of data that allows plausible enough models to be trained. So we accept the occasionally questionable precision of ML models over the huge time and money costs of engineering these kinds of systems the traditional way. LLMs are no different.


Saying ML is a field where deterministic code would work just fine conveniently leaves out the difficult part - writing the actual code.... Which we haven't been able to do for most of the tasks at hand.

What you are saying is fantasy nonsense.


They did not leave it out.

> but the amount of cases it would need to cover is too large to be practical (note, this has nothing to do with the impossibility of its design)


It's not only too large - we can't even enumerate all the edge cases, let alone handle them. It's too difficult.

Using your logic, we don’t need quantum computers to break encryption, we could just use pen and paper.

And all you have to do is write an infinite amount of code to cover all possible permutations of reality! No big deal, really.

> would work just fine, but the amount of cases it would need to cover is too large to be practical

So it doesn't work.


It is impossible to design even in a theoretical sense if functional requirements consider matters such as performance and energy consumption. If you have to write petabytes of code you also have to store and execute it.

[flagged]


I'm a psychiatry resident who has been into ML since... at least 2017. I even contemplated leaving medicine for it in 2022 and studied for that, before realizing that I'd never become employable (because I could already tell the models were getting faster than I am).

You would be sorely mistaken to think I'm utterly uninformed about LLM-research, even if I would never dare to claim to be a domain expert.


> Also, we have full control over their operation and we can trace every signal. Are you surprised we understand them better than brains?

Very, monsieur Laplace.


To be fair to your field, that advancement seems expected, no? We can do things to LLMs that we can't ethically or practically do to humans.

I'm still impressed by the progress in interpretability, I remember being quite pessimistic that we'd achieve even what we have today (and I recall that being the consensus in ML researchers at the time). In other words, while capabilities have advanced at about the pace I expected from the GPT-2/3 days, mechanistic interpretability has advanced even faster than I'd hoped for (in some ways, we are very far from completely understanding the ways LLMs work).

I would be curious to know more precise numbers. My intuition suggests that when Sony sells millions of them, the number diverted for non-gaming purposes is maybe thousands or tens of thousands.


Nearly 90 million units by the time it was discontinued, but I'm not sure how many were sold at the point they removed Linux support.


The programming workspace of the future of the future will have three employees:

A man, a dog and an instance of Claude.

The dog writes the prompts for Claude, the man feeds the dog, and the dog stops the man from turning off the computer.


Thank you for the good laugh! This whole thread is peak satire. Although, be careful. It reminds me of the foreword to a shortstory someone shared on HN recently: „[…] Read it and laugh, because it is very funny, and at the moment it is satire. If you’re still around forty years from now, do the existing societal equivalent of reading it again, and you may find yourself laughing out of the other side of your mouth (remember mouths?). It will probably be much too conservative.“ — https://www.baen.com/Chapters/9781618249203/9781618249203___...


I for one welcome our furry overlords


You're right. They did it. The old man and dog joke has been realized, but the real answer of the future turned out to be: "the dog programs the game, and the man feeds the treat hopper."


That was funny. Gave me good laugh. Thanks..


Previous models from competitors usually got that correct, and the reasoning versions almost always did.

This kind of reflexive criticism isn't helpful, it's closer to a fully generalized counter-argument against LLM progress, whereas it's obvious to anyone that models today can do things they couldn't do six months ago, let alone 2 years back.


I'm not denying any progress, I'm saying that reasoning failures that are simple which have gone viral are exactly the kind of thing that they will toss in the training data. Why wouldn't they? There's real reputational risks in not fixing it and no costs in fixing it.


Given that Gemini 3 Pro already did solid on that test, what exactly did they improve? Why would they bother?

I double checked and tested on AI Studio, since you can still access the previous model there:

>You should drive. >If you walk there, your car will stay behind, and you won't be able to wash it.

Thinking models consistently get it correct and did when the test was brand new (like a week or two ago). It is the opposite of surprising that a new thinking model continues getting it correct, unless the competitors had a time machine.


Why would they bother? Because it costs essentially nothing to add it to the training data. My point is that once a reasoning example becomes sufficiently viral, it ceases to be a good test because companies have a massive incentive to correct it. The fact some models got it right before (unreliably) doesn't mean they wouldn't want to ensure that the model gets it right.


Gary Marcus successfully predicted all ten of the one AI Winters.

He also claimed that LLMs were a failure because of prompts that GPT 3.5 couldn't parse, after the launch of GPT-4,which handled them with aplomb.


Nanobots are fantasy? Nobody told your cells or bacteria I guess. We have an existence proof right there.


Show us how to build machines, create factories, mines, chip fabs, etc., smelt steel, and so forth out of those bacteria and cells and you might have a point.


Same here. I did notice what I think was an actual error on someone's part, there was a chart in the files comparing black to white IQ distributions, and well, just look at it:

https://nitter.net/AFpost/status/2017415163763429779?s=201

Something clearly went wrong in the process.


GPS jamming for incoming drones?


Yep


>Interactive Human Simulator is a bold way to describe spinning up a few GPT calls with mood sliders, but sure, let’s call it anthropology. Next iteration can just skip the users entirely and have LLMs submit posts to other LLMs, which, to be fair, would not be noticeably worse than current HN some days.

My sides


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: