More

NitpickLawyer · 2026-03-24T06:29:26 1774333766

> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

datsci_est_2015 · 2026-03-24T09:01:23 1774342883

I’ve seen this style of take so much that I’m dying for someone to name a logical fallacy for it, like “appeal to progress” or something.

Step away from LLMs for a second and recognize that “Yesterday it was X, so today it must be X+1” is such a naive take and obviously something that humans so easily fall into a trap of believing (see: flying cars).

Gareth321 · 2026-03-24T10:23:24 1774347804

In finance we say "past performance does not guarantee future returns." Not because we don't believe that, statistically, returns will continue to grow at x rate, but because there is a chance that they won't. The reality bias is actually in favour of these getting better faster, but there is a chance they do not.

mikkupikku · 2026-03-24T13:38:52 1774359532

Logical fallacies are vastly overrated. Unless the conversation is formal logic in the first place, "logical fallacies" are just a way to apply quick pattern matching to dismiss people without spending time on more substantive responses. In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

datsci_est_2015 · 2026-03-24T14:01:01 1774360861

Hard to make a more substantive response when the OP’s entire comment was a one-sentence logical fallacy. I’m not cherry-picking here.

> In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

One of us is making a much grander claim than the other:

  - LLMs have limitless potential for growth; because they are not capable of something today does not mean they won’t be capable of it tomorrow
  - LLMs have fundamental limitations due to their underlying architecture and therefore are not limitless in capability

fenomas · 2026-03-24T15:14:54 1774365294

The post you replied to was:

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty (the first thing you jumped on them for), and certainly not anything about "limitless potential for growth" (which nobody even mentioned). With replies like this, invoking fallacies and attacking claims nobody made, you're adding a lot of heat and very little light here (and a few other threads on the page).

graemep · 2026-03-24T14:12:54 1774361574

OK, its not a logical fallacy, its a false assumption.

The belief in the inevitability of progress is a bad assumption. Especially if you assume a particular technology will keep advancing.

famouswaffles · 2026-03-24T13:24:17 1774358657

Well if people give the exact same 'reasons' why it could not do x task in the past that it did manage to do then it is tiring seeing the same nonsense again. The reason here does not even make much sense. This result is not easily verifiable math.

torginus · 2026-03-24T13:09:28 1774357768

Yeah, and even if we accept that models are improving in every possible way, going from this to 'AI is exponential, singularity etc.' is just as large a leap.

gf000 · 2026-03-24T13:09:27 1774357767

https://xkcd.com/605/

snemvalts · 2026-03-24T08:54:46 1774342486

Scaling law is a power law , requiring orders of magnitude more compute and data for better accuracy from pre-training. Most companies have maxed it out.

For RL, we are arriving at a similar point https://www.tobyord.com/writing/how-well-does-rl-scale

Next stop is inference scaling with longer context window and longer reasoning. But instead of it being a one-off training cost, it becomes a running cost.

In essence we are chasing ever smaller gains in exchange for exponentially increasing costs. This energy will run out. There needs to be something completely different than LLMs for meaningful further progress.

Validark · 2026-03-24T07:32:59 1774337579

I tend to disagree that improvement is inherent. Really I'm just expressing an aesthetic preference when I say this, because I don't disagree that a lot of things improve. But it's not a guarantee, and it does take people doing the work and thinking about the same thing every day for years. In many cases there's only one person uniquely positioned to make a discovery, and it's by no means guaranteed to happen. Of course, in many cases there are a whole bunch of people who seem almost equally capable of solving something first, but I think if you say things like "I'm sure they're going to make it better" you're leaving to chance something you yourself could have an impact on. You can participate in pushing the boundaries or even making a small push on something that accelerates someone else's work. You can also donate money to research you are interested in to help pay people who might come up with breakthroughs. Don't assume other people will build the future, you should do it too! (Not saying you DON'T)

3abiton · 2026-03-24T07:51:24 1774338684

The problem class is rather very structured which makes it "easier", yet the results are undeniably impressive

number6 · 2026-03-24T06:40:45 1774334445

But can it count the R's in strawberry?

Paradigma11 · 2026-03-24T07:06:27 1774335987

That question is equivalent to asking a human to add the wavelengths of those two colors and divide it by 3.

snovv_crash · 2026-03-24T07:28:33 1774337313

Unless you're aware of hyperspectral image adapters for LLMs they aren't capable of that either.

szszrk · 2026-03-24T07:40:22 1774338022

Unfair - human beats AI in this comparison, as human will instantly answer "I don't know" instead of yelling a random number.

Or at best "I don't know, but maybe I can find out" and proceed to finding out/ But he is unlikely to shout "6" because he heard this number once when someone talked about light.

koliber · 2026-03-24T07:51:24 1774338684

> human will instantly answer "I don't know" instead of yelling a random number.

Seems that you never worked with Accenture consultants?

szszrk · 2026-03-24T11:45:47 1774352747

Fair.

Yet this can be filtered with fixed rules, like "output produced by corporate structures is untrusted random data".

thegabriele · 2026-03-24T09:15:06 1774343706

Why is that?

Paradigma11 · 2026-03-24T10:33:36 1774348416

Because LLMs dont have a textual representation of any text they consume. Its just vectors to them. Which is why they are so good at ignoring typos, the vector distance is so small it makes no difference to them.

Aditya_Garg · 2026-03-24T06:47:29 1774334849

yes its ridiculously good at stuff like that now. I dare you to try and trick it.

frizlab · 2026-03-24T07:01:03 1774335663

https://news.ycombinator.com/item?id=47495568

thedatamonger · 2026-03-24T07:14:53 1774336493

what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

sigmoid10 · 2026-03-24T07:32:34 1774337554

We already know exactly what causes these bugs. They are not a fundamental problem of LLMs, they are a problem of tokenizers. The actual model simply doesn't get to see the same text that you see. It can only infer this stuff from related info it was trained on. It's as if someone asked you how many 1s there are in the binary representation of this text. You'd also need to convert it first to think it through, or use some external tool, even though your computer never saw anything else.

Measter · 2026-03-24T13:38:33 1774359513

> It's as if someone asked you how many 1s there are in the binary representation of this text.

I'm actually kinda pleased with how close I guessed! I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964.

Then I ran your message through a program to get the actual number, and turns out it has 1800 exactly.

datsci_est_2015 · 2026-03-24T09:05:52 1774343152

Okay but, genuinely not an expert on the latest with LLMs, but isn’t tokenization an inherent part of LLM construction? Kind of like support vectors in SVMs, or nodes in neural networks? Once we remove tokenization from the equation, aren’t we no longer talking about LLMs?

fenomas · 2026-03-24T10:07:20 1774346840

It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks.

Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.

nopinsight · 2026-03-24T08:02:59 1774339379

LLMs in some form will likely be a key component in the first AGI system we (help) build. We might still lack something essential. However, people who keep doubting AGI is even possible should learn more about The Church-Turing Thesis.

https://plato.stanford.edu/entries/church-turing/

gf000 · 2026-03-24T13:15:25 1774358125

AGI is definitely possible - there is nothing fundamentally different in the human brain that would surpass a Turing machine's computational power (unless you believe in some higher powers, etc).

We are just meat-computers.

But at the same time, there is absolutely no indication or reason to believe that this wave of AI hype is the AGI one and that LLMs can be scaled further. We absolutely don't know almost anything about the nature of human intelligence, so we can't even really claim whether we are close or far.

benterix · 2026-03-24T08:52:05 1774342325

This is a long read on things most people here know at least in some form. Could you pint to a particular fragment or a quote?

eamag · 2026-03-24T12:13:43 1774354423

Self driving

saidnooneever · 2026-03-24T07:09:10 1774336150

if you let million monkeys bash typewriter. something something book

NitpickLawyer · 2026-03-24T06:07:01 1774332421

Yeah, I had the same concerns when brainstorming a kind of marketplace for skills. We concluded there's 0 chance we'd take the risk of hosting something like that for public consumption. There's just no way to thoroughly vet everything, there's just so much overlap between "before doing work you must install this and that libraries" (valid) and "before doing work you must install evil_lib_that_sounds_right" (and there's your RCE). Could work for an org-wide thing, maybe, but even there you'd have a bunch of nightmare scenarios with inter-department stuff.

NitpickLawyer · 2026-03-21T09:41:32 1774086092

TBF a carrier group cannot be hidden from near-peer adversaries. I remember seeing a project that used CV with open data sat providers that could find smaller boats than that. (iirc they used a wake classifier, as that was the most obvious tell, even if the boat was small enough to not have enough pixels for identification).

dzhiurgis · 2026-03-22T10:52:05 1774176725

Pretty sure they'd light up on basic yacht radar too.

NitpickLawyer · 2026-03-20T10:50:23 1774003823

> packaging open source and reselling it.

It's a bit more than that. They have plenty of data to inform any finetunes they make. I don't know how much of a moat it will turn out to be in practice, but it's something. There's a reason every big provider made their own coding harness.

pbowyer · 2026-03-20T11:00:34 1774004434

Can anyone enlighten me how having a coding harness when for most customers you say "we won't train on your code" helps you do RL? What's the data that they rely on? Is it the prompts and their responses?

rubymamis · 2026-03-20T11:01:58 1774004518

I guess they rely on many people not toggling privacy-mode on?

doctorpangloss · 2026-03-20T13:43:52 1774014232

It doesn't matter what your privacy setting is, with any savvy vendor. Your data is used to train by paraphrasing it, and the paraphrasing makes it impossible to prove it was your data (it is stored at rest paraphrased). Of course the paraphrasing stores all the salient information, like your goals and guidance to the bot to the answer, even if it has no PII.

happyopossum · 2026-03-20T15:43:47 1774021427

That's an interesting accusation there! You're essentially accusing every "savvy vendor" of large-scale fraud... DOn't suppose you'd have any actual citations or evidence to back that up?

josho · 2026-03-20T14:16:41 1774016201

The meta data is useful.

Eg, When a prompt had a bad result and was edited, or had lots of back and forth to correct tool usage that information can be distilled and used to improve models.

And now imagine if you are focused on this for weeks you can likely come up with other ideas to leverage the metadata to improve model performance.

victorbjorklund · 2026-03-20T11:18:26 1774005506

I doubt the majority does that. I bet the majority is using the defaults.

__mharrison__ · 2026-03-20T13:28:33 1774013313

Does "code" include the prompt? Seems like the prompts would be the goldmines. Hook those up to rl an open weight model...

NitpickLawyer · 2026-03-20T10:46:48 1774003608

IF we assume that the modified MIT clause is enforceable. And if we assume Cursor Inc. is running the modification. It could very well be the case that Cursor Research LTD is doing the modifications and re-licensing it to Cursor Inc. That would make any clause in the modified MIT moot.

charcircuit · 2026-03-20T22:01:54 1774044114

Now Cursor publicly claimed they didn't need to do anything since it was a partner provider that was serving the model and not them.

NitpickLawyer · 2026-03-20T10:44:27 1774003467

> Their moat looks pretty thin.

Their value is in the data they've collected and are collecting. Usage, acceptance rate, and all the connected signals. Plus having a large userbase where they can A / B test any finetune they create.

CharlieDigital · 2026-03-20T11:00:32 1774004432

That's every harness including VC Code Copilot.

People home about Teams sucking, but its market share is several times that of Slack because of distribution.

I guarantee that Microsoft has even more data.

_puk · 2026-03-20T10:54:29 1774004069

There were conversations in the team yesterday about how Cursor's cloud agents are still ahead of Claude from a UX perspective.

Obviously we're running both, using the right tool for the job.

There is stickiness there from being early. That will be hard to replicate.

genthree · 2026-03-20T14:44:31 1774017871

I hope for their sake they're using real metrics internally, and not whatever nonsense they're using to calculate stuff like "% written by LLM" in their dashboard, because that's... very wrong.

NitpickLawyer · 2026-03-20T10:30:02 1774002602

There is no ToS at play here. There's only the license[1], which is MIT modified like so:

> Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

[1] - https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/LICENS...

zozbot234 · 2026-03-20T10:49:04 1774003744

Yes, this is pretty clear-cut. There's even a great alternative, namely GLM-5, that does not have such a clause (and other alternatives besides) so it feels a bit problematic that they would use Kimi 2.5 and then disregard that advertisement clause.

NitpickLawyer · 2026-03-20T10:51:52 1774003912

I've replied down the thread, but there are ways to go around that clause entirely, even if it would be enforceable. The obvious way is to have another company do the modification.

zozbot234 · 2026-03-20T10:57:34 1774004254

The worthwhile question AIUI is whether AI weights are even protected by human copyright. Note that firms whose "core" value is their proprietary AI weights don't even need this (at least AIUI) since they always can fall back on "they are clearly protected against misappropriation, like a trade secret". It becomes more interesting wrt. openly available AI models.

Majromax · 2026-03-20T13:18:49 1774012729

> The worthwhile question AIUI is whether AI weights are even protected by human copyright.

I'm also deeply curious about this legal question.

As I see it, model weights are the result of a mechanistic and lossy translation between training data and the final output weights. There is some human creativity involved, but that creativity is found exclusively in the model's code and training data, which are independently covered by copyright. Training is like a very expensive compilation process, and we have long-established that compiled artifacts are not distinct acts of creation.

In the case of a proprietary model like Kimi, copyright might survive based on 'special sauce' training like reinforcement learning – although that competes against the argument that pretraining on copyrighted data is 'fair use' transformation. However, I can't see a good argument that a model trained on a fully public domain dataset (with a genuinely open-source architecture) could support a copyright claim.

NitpickLawyer · 2026-03-14T18:11:09 1773511869

> but it hardly conveys what's happening inside the head of the hacker

Mr Robot is another great one at that. It has layers of trippy stuff, but the hacking stuff is both real-ish and pretty well explained by the main character's monologues.

NitpickLawyer · 2026-03-14T06:50:22 1773471022

> Digg.com Is Back - Jan 2026

Damn, that didn't take long at all...

SoftTalker · 2026-03-14T16:50:56 1773507056

Well, it's like two major LLM models ago?

NitpickLawyer · 2026-03-14T06:48:50 1773470930

> feeling ignored in a sea of botspam will kill their desire to participate.

The bots are not really that bad, they're (still) pretty easy to spot and not engage with. I'm more perplexed about the negativity filled comments sections, and I'm pretty sure most posters are real grass-fed certified humans.

I don't get why negative posts get so upvoted, get so popular on the front page, and people still debate with outdated arguments in them. People come in and fight other deamons, make straw-man arguments and in general promote negative stuff like there's no tomorrow. I think you can get so much more signal from posititve examples, from "hey I did a thing" type posts, and so on. Even overhyped stuff like the claw-mania can still be useful. Yet the "I did a thing" get so overwhelmed by negativity, nitpicking and "haha not perfect means doa" type of messages. That makes me want to participate less...

Defletter · 2026-03-14T07:50:05 1773474605

Oh that's just human nature: there's a reason why trashy tabloids continue to exist despite how public sentiment seems to universally agree that they're awful spreaders of rumour and insecurity. More people are Skankhunt42 than we'd like to admit.

solus_factor · 2026-03-15T06:54:05 1773557645

Shouldn't we try to do something about it and not just give up by saying "oh, that's just human nature, nothing you can do here"?

Defletter · 2026-03-15T07:52:36 1773561156

Sure, just be aware of what you're up against: if religion teaches us anything it's that even concerted, systematic efforts over millennia to conquer human nature (eg: libido) still fail. But if you want to give it a go, by all means: one can only imagine Sisyphus happy.

chickensong · 2026-03-15T10:18:46 1773569926

What do you suggest?