More

_air · 2026-03-23T15:48:15 1774280895

This is awesome! How far away are we from a model of this capability level running at 100 t/s? It's unclear to me if we'll see it from miniaturization first or from hardware gains

Tade0 · 2026-03-23T16:06:40 1774282000

Only way to have hardware reach this sort of efficiency is to embed the model in hardware.

This exists[0], but the chip in question is physically large and won't fit on a phone.

[0] https://www.anuragk.com/blog/posts/Taalas.html

tclancy · 2026-03-23T16:35:52 1774283752

I think you're ignoring the inevitable march of progress. Phones will get big enough to hold it soon.

tren_hard · 2026-03-23T18:34:44 1774290884

Instead of slapping on an extra battery pack, it will be an onboard llm model. Could have lifecycles just like phones.

Getting bigger (foldable) phones, without losing battery life, and running useable models in the same form-factor is a pretty big ask.

RALaBarge · 2026-03-23T17:34:45 1774287285

I think the future is the model becoming lighter not the hardware becoming heavier

Tade0 · 2026-03-23T17:51:47 1774288307

The hardware will become heavier regardless I'm afraid.

TeMPOraL · 2026-03-24T10:06:41 1774346801

Good. It's ridiculously tiny and lightweight these days.

Especially with phones; the first thing everyone does after buying their new uber thin iPhone is buying a case for it, which doubles its thickness.

intrasight · 2026-03-23T16:26:44 1774283204

I think for many reasons this will become the dominant paradigm for end user devices.

Moore's law will shrink it to 8mm soon. I think it'll be like a microSD card you plug in.

Or we develop a new silicon process that can mimic synaptic weights in biology. Synapses have plasticity.

bigyabai · 2026-03-23T16:30:54 1774283454

One big bottleneck is SRAM cost. Even an 8b model would probably end up being hundreds of dollars to run locally on that kind of hardware. Especially unpalatable if the model quality keeps advancing year-by-year.

> Or we develop a new silicon process that can mimic synaptic weights in biology. Synapses have plasticity.

It's amazing to me that people consider this to be more realistic than FAANG collaborating on a CUDA-killer. I guess Nvidia really does deserve their valuation.

intrasight · 2026-03-23T16:48:02 1774284482

> bottleneck is SRAM cost

Not for this approach

ottah · 2026-03-23T16:38:36 1774283916

That's actually pretty cool, but I'd hate to freeze a models weights into silicon without having an incredibly specific and broad usecase.

patapong · 2026-03-23T17:39:00 1774287540

Depends on cost IMO - if I could buy a Kimi K2.5 chip for a couple of hundred dollars today I would probably do it.

whatever1 · 2026-03-23T17:42:04 1774287724

I mean if it was small enough to fit in an iPhone why not? Every year you would fabricate the new chip with the best model. They do it already with the camera pipeline chips.

superxpro12 · 2026-03-23T17:46:50 1774288010

Sounds like just the sort of thing FGPA's were made for.

The $$$ would probably make my eyes bleed tho.

chrsw · 2026-03-23T18:04:45 1774289085

Current FPGAs would have terrible performance. We need some new architecture combining ASIC LLM perf and sparse reconfiguration support maybe.

0x457 · 2026-03-23T18:49:46 1774291786

Wouldn't it be the opposite of freezing weights?

originalvichy · 2026-03-23T16:09:35 1774282175

On smartphones? It’s not worth it to run a model this size on a device like this. A smaller fine-tuned model for specific use cases is not only faster, but possibly more accurate when tuned to specific use cases. All those gigs of unnecessary knowledge are useless to perform tasks usually done on smartphones.

ottah · 2026-03-23T16:36:14 1774283774

Probably 15 to 20 years, if ever. This phone is only running this model in the technical sense of running, but not in a practical sense. Ignore the 0.4tk/s, that's nothing. What's really makes this example bullshit is the fact that there is no way the phone has a enough ram to hold any reasonable amount of context for that model. Context requirements are not insignificant, and as the context grows, the speed of the output will be even slower.

Realistically you need +300GB/s fast access memory to the accelerator, with enough memory to fully hold at least greater than 4bit quants. That's at least 380GB of memory. You can gimmick a demo like this with an ssd, but the ssd is just not fast enough to meet the minim specs for anything more than showing off a neat trick on twitter.

The only hope for a handheld execution of a practical, and capable AI model is both an algorithmic breakthrough that does way more with less, and custom silicon designed for running that type of model. The transformer architecture is neat, but it's just not up for that task, and I doubt anyone's really going to want to build silicon for it.

alwillis · 2026-03-23T18:46:26 1774291586

> Realistically you need +300GB/s fast access memory to the accelerator, with enough memory to fully hold at least greater than 4bit quants.

The latest M5 MacBook Pro's start at 307 GB/s memory bandwidth, the 32-core GPU M5 Max gets 460 GB/s, and the 40-core M5 Max gets 614 GB/s. The CPU, GPU, and Neural Engine all share the memory.

The A19/A19 Pro in the current iPhone 17 line is essentially the same processor (minus the laptop and desktop features that aren’t needed for a phone), so it would seem we're not that far off from being able to run sophisticated AI models on a phone.

alpineman · 2026-03-24T08:36:55 1774341415

Agree with the first part - but I can run GPT OSS 20b, a highly capable model on my laptop with 32GB of RAM at speeds that for all practical intents is as fast as GPT-5.4 and good enough for 90%+ of non-technical use cases.

As such I can't agree with "The only hope for a handheld execution of a practical, and capable AI model is both an algorithmic breakthrough" - we are much closer than 15/20 years to get these on a phone

zozbot234 · 2026-03-24T09:28:58 1774344538

With this work you can run a medium-sized model like GPT OSS 20b at native speed even while keeping those 32GB RAM almost fully available for other uses - the model seamlessly starts to slow down as RAM requirements increase elsewhere in the system and the fs cache has to evict more expert layers, and reaches full speed again as the RAM is freed up. It adds a key measure of flexibility to the existing AI local inference picture.

zozbot234 · 2026-03-23T22:14:26 1774304066

KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context length, or be streamed to storage as a last resort. This actually doesn't impact performance too much, since we were already limited by having to stream in the much larger weights.

smlacy · 2026-03-23T22:02:16 1774303336

This should be the top comment

svachalek · 2026-03-23T17:04:53 1774285493

A long time. But check out Apollo from Liquid AI, the LFM2 models run pretty fast on a phone and are surprisingly capable. Not as a knowledge database but to help process search results, solve math problems, stuff like that.

root_axis · 2026-03-23T18:31:24 1774290684

It will never be possible on a smart phone. I know that sounds cynical, but there's basically no path to making this possible from an engineering perspective.

NetMageSCW · 2026-03-23T21:25:05 1774301105

No one needs more than 640K!

DrewADesign · 2026-03-24T00:08:30 1774310910

Quantum computing is right around the corner!

bushbaba · 2026-03-24T07:39:50 1774337990

This comment will age well.

iooi · 2026-03-23T17:30:50 1774287050

Is 100 t/s the stadard for models?

_air · 2026-01-02T23:24:04 1767396244

It would be nice if there were a domain specific language that could help with the internal consistency problem

nick2837 · 2026-01-09T19:04:04 1767985444

i agree. it's still very early with AI programming in general, so this might evolve in the next few years

_air · on Feb 14, 2025

Getting hugged: https://web.archive.org/web/20250214221951/https://aresluna....

_air · on May 24, 2024

Is Google losing significant web traffic to ChatGPT and others? I don't understand why else they'd test a product like this so widely.

al_borland · on May 24, 2024

I think they are terrified of GenAI eliminating people's desire to do traditional web searches. It's basically the quick answers Google has been offering for years, but much more in depth and complete (if correct).

My dad has been talking a lot about how he wants to start using ChatGPT more instead of Google, and he thinks is going to make his life much better.

rsynnott · on May 24, 2024

I would assume to appease the markets, whose limited collective wisdom is currently zeroing in on AI. Google has been known to overreact before; remember Google+, at the height of the social media boom?

pylua · on May 24, 2024

I don’t know, but I am curious as how they are going to monetize it.

rovr138 · on May 24, 2024

I'm curious how websites will stay in business once it's kinda good or people just start thinking it's true

carlosjobim · on May 24, 2024

You will pay Google to insert your ads into AI generated answers. No need for the hassle of creating a website anymore. What a relief for businesses!

pylua · on May 24, 2024

That’s an interesting thought. Just order your fast food through Gemini, or schedule your car repair or book a hotel directly through gemini without going to a different website.

I haven’t even considered that.

rovr138 · on May 24, 2024

I swear I saw a screenshot of Microsoft’s with ads inline. I couldn’t find it after.

pylua · on May 24, 2024

Seems like eventually, it would become the dominant predator, and deplete its food supply?

rovr138 · on May 24, 2024

With these companies crawling the internet, it’ll be a race to create content. Then reuse the models to generate more, new content for your website

It’ll be diminishing returns for visitors

pylua · on May 24, 2024

Worse than that is that it will diminish the desire to create.

balder1991 · on May 24, 2024

Maybe the websites remaining will be genuinely interesting texts that you’d actually want to read instead of just query for an answer.

rovr138 · on May 24, 2024

Sure, things like blogs.

But if you need info for a movie or something, where will the data come from? Back to newspapers?

balder1991 · on May 24, 2024

I would imagine Google doing something like: “just use this service to keep your information up to date and you’ll get monetized automatically based on users querying your data”.

rovr138 · on May 24, 2024

Right now they have us convinced to optimize how we give them data for them to be able to parse it (schema.org, etc). So we give them the data, optimize it for them ingesting it, then we loose traffic because people get the data on the search page.

We'll probably have to pay them to give them the data...

balder1991 · on May 24, 2024

Yeah, but thinking about it now, this happens because Google has a monopoly on search and they can afford to dictate the rules. I think that with more companies competing with AI the field is a bit more even, and Google would be the one needing this kind of “advantage”.

seattle_spring · on May 24, 2024

I honestly wonder if there's eventually going to be "sponsored" answers. I shudder to even think of it, but it seems naive to think we are not going in that direction.

nojs · on May 24, 2024

https://www.adweek.com/media/openai-preferred-publisher-prog...

pylua · on May 24, 2024

It also may not be as blatant or as obvious as sponsored results are now. It could be mixed in seamlessly.

cdme · on May 24, 2024

AlienRobot · on May 24, 2024

What are you going to do? Switch to Bing?

Good luck, Bing puts Chat GPT on top of their results!

Because what are you going to do? Switch to Brave Search? To Kagi?

Well, guess who else is trying to put AI into their search engines!

__loam · on May 24, 2024

It's like a percent. For Google I guess that's significant.

IAmNotACellist · on May 24, 2024

If they had any sense at all (questionable) they would be absolutely pants-shittingly terrified that their prized product has been completely undercut by an upstart in a matter of months, and that Microsoft has a deep partnership with that competitor. I'd think even the leadership at Google can figure out that ChatGPT eats their lunch in terms of cutting through the SEO bullshit and giving actual information.

If they were even approaching baseline-clever, they'd realize that they've caused people to be so pissed at the declining quality of their search for years and that they'll happily jump ship.

__loam · on May 24, 2024

Completely is a wild overstatement of what OpenAI has done to Google. There was an article on hackernews a few weeks ago about the actual trends that showed GPT had barely scratched Google's share of traffic and that OpenAI was hemorrhaging users after the initial sign up.

mike_d · on May 24, 2024

Google was always better than Yahoo/Bing at dealing with webspam (whether Google can continue to beat current webspam is a different debate). Bing is happily traning on the things they don't know are bad results. Garbage in garbage out.

The only competition Google needs to worry about is Google's leadership. Once Cloud brought in TK and they started actively recruiting from Microsoft and Oracle it was like an infection of stupid they haven't been able to fight.

shooker435 · on May 24, 2024

Not sure if it's strictly the recruiting pool that made things break down, but I see it more as a result of COVID/WFH + the recruiting pool. Google's strong in-office culture once helped new joiners learn the culture and challenge others in a respectful way - now it's a political minefield.

Remote work is great, but it probably accelerated Google's culture decline. If you've been there a while, you'll notice the wild difference in employee attitudes when comparing pre-2019 employees to post 2020 employees.

yterdy · on May 24, 2024

Depending on who you ask, Google's search issues started as early as 2010 (Instant) or 2016 ("brands"). Many of the things people complain about regarding Google's culture - shuttering projects arbitrarily, hiring issues (the interview gauntlet, anti-competitive practices, etc.), the slow erosion of Don't Be Evil - are 2010s products, also. I don't think this is WFH, at its root.

mike_d · on May 24, 2024

> Remote work is great, but it probably accelerated Google's culture decline.

Absolutely agree. A lot of the top talent bailed when they started demanding return to office. Google played the "if you don't X we will fire you" with a bunch of L7+ that ... surprise, could easily get jobs elsewhere or had enough GSUs to flat out retire.

shooker435 · on May 24, 2024

Yeah, I left to start my own company (which I was already contemplating) but some of the changes in early 2023 were the catalyst that I needed to make the jump. I made sure to give the best feedback that I could in my exit interview, but I doubt it has much of an impact at a company that large.

AlienRobot · on May 24, 2024

Personally, every time I see chat GPT's output I just skip it. I look at it and I'm not sure it's quoting things literally or changing them, so I can't trust anything the summary says, and if I'm going to click the links anyway, I don't need the summary.

_air · on Feb 9, 2024

If I zoom in closely, I think I see a yes. Seriously, some data looks great in a pie chart

_air · on Jan 30, 2024

http://archive.today/baKGL

_air · on Dec 14, 2023

Also: https://xkcd.com/1189/

_air · on Oct 30, 2023

https://web.archive.org/web/20231030201749/https://lukasrose...

_air · on Sept 12, 2023

https://archive.ph/JMEEX

neonate · on Sept 12, 2023

http://web.archive.org/web/20230912014545/https://changelog....

_air · on Jan 4, 2022

I think the quote reads: "Far and away the best prize that life offers is the chance to work hard at work worth doing."