More

thethirdone · 2026-03-24T21:36:19 1774388179

Compare that to ~30% of all energy use for transportation. So approximately 40%*4% = 1.6% vs 30%. I find your correction to be more wrong that the initial statement.

> And most of that new capacity will be natural gas. That increase would basically whipe out the reduction in CO2 emissions the USA has had since 2018.

Emissions in 2018 were ~5250M metric ton and in 2024 it was 4750M. That is a reduction of 10% total emissions. Without going into calculations of green electricity and such, its still safe to say AI using 10% of the grid would not completely wipe out the reduction.

[0]: https://www.statista.com/statistics/183943/us-carbon-dioxide...

graypegg · 2026-03-24T21:50:06 1774389006

> Compare that to ~30% of all energy use for transportation

Transportation, especially ALL transportation, does a LOT. You're looking for ROI not the absolute values. I think it's undeniable that the positive economic effect of every car, truck, train, and plane is unfathomably huge. That's trains moving minerals, planes moving people, trucks transporting goods, and hundreds of combinations thereof, all interconnected. Literally no economic activity would happen without transportation, including the transition to green energy sources, of which would improve the emissions from transportation.

I think it might be more emissions-efficient at generating value than AI by a factor exceeding the 7.5x energy use. Moving rocks from (place with rocks) to (place that needs rocks) continues to be just an insanely good thing for humanity.

Also, I'm not sure about your math. 4% would be 4% of the whole like in a pie chart, not 4% of the remainder after removing one slice. 4% AI, 30% transportation, 66% other. I don't know where that 40% is from.

thethirdone · 2026-03-24T23:13:30 1774394010

> Also, I'm not sure about your math. 4% would be 4% of the whole like in a pie chart, not 4% of the remainder after removing one slice. 4% AI, 30% transportation, 66% other. I don't know where that 40% is from.

40% is for energy use in the US in the form of electricity. It was a rough number that I pulled from my memory. It is roughly right though. Check https://www.eia.gov/energyexplained/us-energy-facts/

AI is not currently 4% of the energy market of the US. Only the grid. I should have been more clear about the ALL ENERGY vs GRID distinction.

> I think it might be more emissions-efficient at generating value than AI by a factor exceeding the 7.5x energy use. Moving rocks from (place with rocks) to (place that needs rocks) continues to be just an insanely good thing for humanity.

I really made no statement on the value of doing things. Transportation is obviously very valuable. I just wanted a more fact based conversation.

elbasti · 2026-03-24T22:25:22 1774391122

> Compare that to ~30% of all energy use for transportation. So approximately 40%*4% = 1.6% vs 30%. I find your correction to be more wrong that the initial statement.

I don't follow. The comparison is 30% of energy use for transportation vs 4% for AI, and soon 30% for transportation vs 10% for AI.

thethirdone · 2026-03-24T23:17:53 1774394273

The grid is not all energy use. To get the numbers on an even playing field you need to compensate for that only ~40% of energy goes through the grid.

And that leaves a 6:1 ratio assuming projections run true. It very well might be possible to get efficiency wins from the transportation sector that outweigh growth in AI.

thethirdone · 2026-03-15T22:26:46 1773613606

I agree that the movements look quite robotic (though not as much as you might expect), but I don't think any movies have depicted robots moving like that. A much more common depiction is moving only a single joint at a time.

> Sharp, unsure movements, a lot of hesitation, ...

I like these particular descriptors. Another I would add is holding poses unnaturally still. While waiting for the ball, the robot holds its racket extremely consistently relative to its body even while sharply turning.

thethirdone · 2026-02-20T03:31:08 1771558268

You would not. You don't normally post lots of comments. The occasional return after a long period of inactivity is not in itself suspicious.

thethirdone · 2026-01-31T16:07:05 1769875625

The d suffix makes it not compile under clang. The PRs seem like mostly small changes that are clear improvements.

thethirdone · 2026-01-30T04:58:54 1769749134

The criteria were laid out in 2019 [0]. It was less clear then.

> If you are a "rustacean" and feel that Rust already meets the preconditions listed above, and that SQLite should be recoded in Rust, then you are welcomed and encouraged to contact the SQLite developers privately and argue your case.

It seems like the criteria are less of things the SQLite developers are claiming Rust can't do and more that they are non-negotiable properties that need to be considered before even bringing the idea of a rust version to the team.

I think it is at least arguable that Rust does not meet the requirements. And they did explicitly invite private argument if you feel differently.

0: https://web.archive.org/web/20190423143433/https://sqlite.or...

josephg · 2026-01-30T05:18:47 1769750327

Ah, I assumed the page was written recently due to this message at the bottom:

>> This page was last updated on 2025-05-09 15:56:17Z <<

> I think it is at least arguable that Rust does not meet the requirements

Absolutely. The lack of clean OOM handling alone might be a dealbreaker for sqlite. And I suspect sqlite builds for some weird platforms that rustc doesn't support.

But I find it pretty weird reading comments about how rust needs prove it performs similarly to C. Benchmarks are just a google search away folks.

> And they did explicitly invite private argument if you feel differently.

Never.

Its not up to me what language sqlite is written in. Emailing the sqlite authors to tell them to rewrite their code in a different language would be incredibly rude. They can write sqlite in whatever language they want. My only real choice is whether or not I want to use their code.

thethirdone · 2026-01-21T00:30:06 1768955406

Based on Table 1: This method is actually worse than generating a random number (0-100% independent of the program) and testing if it is less than 98.8%. That would achieve a better detection rate without increasing the false positive rate.

It doesn't seem worth it to try to follow the math to see if there is something interesting.

thethirdone · 2026-01-13T23:35:23 1768347323

Which ones of those have been achieved in your opinion?

I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.

Then, probably reading a novel and answering questions is the next most successful.

Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.

zozbot234 · 2026-01-13T23:41:52 1768347712

Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.

kleene_op · 2026-01-13T23:48:25 1768348105

> Reliably constructing 10k bug free lines is probably the least successful.

You imperatively need to try Claude Code, because it absolutely does that.

thethirdone · 2026-01-13T23:52:16 1768348336

I have seen many people try to use Claude Code and get LOTS of bugs. Show me any > 10k project you have made with it and I will put the effort in to find one bug free of charge.

thethirdone · 2025-12-08T20:38:16 1765226296

> The simplest example being that LLM's somehow function in a similar fashion to human brains. They categorically do not. I do not have most all of human literary output in my head and yet I can coherently write this sentence.

The ratio of cognition to knowledge is much higher in humans that LLMs. That is for sure. It is improving in LLMs, particularly small distillations of large models.

A lot of where the discussion gets hung up on is just words. I just used "knowledge" to mean ability to recall and recite a wide range of fasts. And "cognition" to mean the ability to generalize, notice novel patterns and execute algorithms.

> They don't actually understand anything about what they output. It's just text.

In the case of number multiplication, a bunch of papers have shown that the correct algorithm for the first and last digits of the number are embedded into the model weights. I think that counts as "understanding"; most humans I have talked to do not have that understanding of numbers.

> It's just an algorithm.

> I am surprised so many in the HN community have so quickly taken to assuming as fact that LLM's think or reason. Even anthropomorphising LLM's to this end.

I don't think something being an algorithm means it can't reason, know or understand. I can come up with perfectly rigorous definitions of those words that wouldn't be objectionable to almost anyone from 2010, but would be passed by current LLMs.

I have found anthropomorphizing LLMs to be a reasonably practical way to leverage the human skill of empathy to predict LLM performance. Treating them solely as text predictors doesn't offer any similar prediction; it is simply too complex to fit into a human mind. Paying a lot of attention to benchmarks, papers, and personal experimentation can give you enough data to make predictions from data, but it is limited to current models, is a lot of work, and isn't much more accurate than anthropomorphization.

encyclopedism · 2025-12-08T22:59:43 1765234783

> The ratio of cognition to knowledge is much higher in humans that LLMs. That is for sure. It is improving in LLMs, particularly small distillations of large models.

It isn't a case of ratio it is a fundamentally different method of working hence my point of not needing all human literary output do the the equivalent of an LLM. Consider even the case of a person born blind they have an even more severe deficiency of input yet they are equivalent in cognitive capacity to a sighted person and certainly any LLM.

> In the case of number multiplication, a bunch of papers have shown that the correct algorithm for the first and last digits of the number are embedded into the model weights. I think that counts as "understanding";

Why are those numbers in the model weights? What if the model was trained on birdsong instead of humanities output would it then be able to multiply? Humans provide the connections, the reasoning the thought the insights and the subsequent correlations THEN we humans try to make a good pattern matcher/ guesser (the LLM) to match those. We tweak it so it matches patterns more and more closely.

> most humans I have talked to do not have that understanding of numbers.

This common retort: most humans also makes mistakes, or most humans also do x, y, z means nothing. Take the opposite implication of such retorts. For example most humans can't multiply 10 digits numbers therefore most calculators 'understand' maths better than most humans.

> I don't think something being an algorithm means it can't reason, know or understand. I can come up with perfectly rigorous definitions of those words that wouldn't be objectionable to almost anyone from 2010, but would be passed by current LLMs.

My digital thermometer uses an algorithm to determine the temperature. It does NOT reason when doing so. An algorithm is a series of steps. You can write them on a piece of paper. The paper will not be thinking if that is done.

> I have found anthropomorphizing LLMs to be a reasonably practical way to....

I think anthropomorphising is letting people assume they are more than they are (next token generators). In fact at the extreme end this anthropomorphising has led to exacerbating mental health conditions and unfortunately has even led to humans killing themselves.

thethirdone · 2025-12-08T23:46:41 1765237601

You did not actually address the core of my points at all.

> It isn't a case of ratio it is a fundamentally different method of working hence my point of not needing all human literary output do the the equivalent of an LLM.

You can make ratios of anything. I agree that human cognition is different than LLM cognition, though I would think of it more like a phase difference than fundamentally different phenomena. Think liquid water vs steam, the density (a ratio) is vastly different and they have different harder to describe properties (surface tension, filling volume, incompressible vs compressible).

> Humans provide the connections, the reasoning the thought the insights and the subsequent correlations THEN we humans try to make a good pattern matcher/ guesser (the LLM) to match those.

Yes, humans provide the training data and benchmarks for measuring LLM improvement. Somehow meaning about the world has to get trained on to have any understanding. However, humans talking about patterns in number is not how the LLMs learned this. It is very much from just seeing lots of examples and deducing (during training not inference) the pattern. The fact that a general pattern is embedded in the weights implies that some general understand of many things are baked into the model.

> This common retort: most humans also makes mistakes, or most humans also do x, y, z means nothing.

It is not a retort, but some argument towards what "understanding" means. From what you have said, my guess of your definition makes "understanding" what humans do and computers are incapable of (by definition). If LLMs could out compete humans in all professional tasks, I think it would be hard to say they understand nothing. Humans are a worthwhile point of comparison and human exceptionalism can only really hold up until being surpassed.

I would also point out that some humans DO understand the properties of numbers I was referring to. In fact, I figured it out in second grade while doing lots of extra multiplication problems as punishment for being a brat.

> My digital thermometer uses an algorithm to determine the temperature. ... The paper will not be thinking if that is done.

I did not say "All algorithms are thinking". The stronger version of what I was saying is "Some algorithms can think." You simply have asserted the opposite with no reasoning.

> In fact at the extreme end this anthropomorphising has led to exacerbating mental health conditions and unfortunately has even led to humans killing themselves.

I do concede that anthropomorphizing can be problematic, especially if you do not have a background in CS and ML to understand beneath the hood. However, you completely skipped past my rather specific explanation of how it can be useful. On HN in particular, I do expect people to bring enough technical understanding to the table to not just treat LLMs as people.

thethirdone · 2025-11-27T01:22:18 1764206538

I disagree with the framing in 2.1 a lot.

  > Models look god-tier on paper:
  >  they pass exams
  >  solve benchmark coding tasks
  >  reach crazy scores on reasoning evals

Models don't look "god-tier" from benchmarks. Surely an 80% is not godlike. I would really like more human comparisons for these benchmarks to get a good idea of what an 80% means though.

I would not say that any model shows a "crazy" score on ARC-AGI.

I broadly have seen incremental improvements in benchmarks since 2020, mostly at a level I would believe to be below average human reasoning, but above average human knowledge. No one would call GPT-3 godlike and it is quite similar to modern models in benchmarks; it is not a difference of like 1% vs 90%. I think most people would consider gpt-3 to be closer to opus 4.5 than opus 4.5 is to a human.

majormajor · 2025-11-27T01:26:47 1764206807

Roughly I'd agree, although I don't have hard numbers, and I'd say GPT-4 in 2023 vs GPT-3 as the last major "wow" release from a purely-model perspective. But they've also gotten a lot faster, which has its own value. And the tooling around them has gotten MASSIVELY better - remember the "prompt engineering" craze? Now there are a lot of tools out there that will take your two-sentence prompt and figure out - even asking you questions sometimes - how to best execute that based on local context like in a code repository, and iterate by "re-prompting" itself over and over. In a fraction of the time you could've done that by manual "prompt engineering."

Though I do not fully know where the boundary between "a model prompted to iterate and use tools" and "a model trained to be more iterative by design" is. How meaningful is that distinction?

But the people who don't get this are the less-technical/less-hands-on VPs, CEOs, etc, who are deciding on layoffs, upcoming headcount, "replace our customer service or engineering staffs with AI" things. A lot of those moves are going to look either really silly or really genius depending on exactly how "AGI-like" the plateau turns out to be. And that affects a LOT of people's jobs/livelihood, so it's good to see the hype machine start to slow down and get more realistic about the near-term future.

dwohnitmok · 2025-11-27T06:04:58 1764223498

> I'd say GPT-4 in 2023 vs GPT-3 as the last major "wow" release from a purely-model perspective. But they've also gotten a lot faster, which has its own value. And the tooling around them has gotten MASSIVELY better

Tooling vs model is a false dichotomy in this case. The massive improvements in tooling are directly traceable back to massive improvements in the models.

If you took the same tooling and scaffolding and stuck GPT-3 or even GPT-4 in it, they would fail miserably and from the outside the tooling would look abysmal, because all of the affordances of current tooling come directly from model capability.

All of the tooling approaches of modern systems were proposed and prototypes were made back in 2020 and 2021 with GPT-3. They just sucked because the models sucked.

The massive leap in tooling quality directly reflects a concomitant leap in model quality.

azinman2 · 2025-11-27T01:48:12 1764208092

How do you avoid overfitting with the automated prompts? It seems to add lots of exceptions from what I've seen in the past versus generalize as much as a human would.

adastra22 · 2025-11-27T01:53:21 1764208401

Ask the agent "Is this over-fitting?"

I'm not joking.

levocardia · 2025-11-27T02:01:36 1764208896

I dunno, some of the questions on things like Humanity's Last Exam sure strike me as "godlike." Yes, I'm happy that I can still crush LLMs on ARC-AGI-2 but I see the writing on the wall there, too. Barely over a year ago LLMs were what, single digit percentages on ARC-AGI-1?

thethirdone · 2025-11-27T04:29:17 1764217757

I would hope god can do better than 40% on a test. If you select experts from the relevant fields humans, they together would get a passing grade (70%) at least. A group of 20 humans is not godlike.

thethirdone · 2025-11-22T22:06:24 1763849184

In this paper both the diffusion and the auto-regressive models are transformers with O(n^2) performance for long sequences. They share the "Exact KV Cache" for committed tokens.

Diffusion just allows you to spend more compute at the same time so you don't redundantly access the same memory. It can only improve speed beyond the memory bandwidth limit by committing multiple tokens each pass.

Other linear models like Mamba get away from O(n^2) effects, but type of neural architecture is orthogonal to the method of generation.