Hacker Newsnew | past | comments | ask | show | jobs | submit | theK's commentslogin

Aren't these numbers really bad? > 80% needle retrieval means every fifth memory is akin to a hallucination.

I don't think it quite means that - happy to be corrected on this, but I think it's more like what percentage it can still pay attention to. If you only remembered "cat sat mat", that's only 50% of the phrase "the cat sat on the mat", but you've still paid attention to enough of the right things to be able to fully understand and reconstruct the original. 100% would be akin to memorizing & being able to recite in order every single word that someone said during their conversation with you.

But even if I've misunderstood how attention works, the numbers are relative. GPT 5.4 at 1M only achieves 36% needle retrieval. Gemini 3.1 & GPT 5.4 are only getting 80% at even the 128K point, but I think people would still say those models are highly useful.


It seems to be the hit rate of a very straightforward (literal matching) retrieval. Just checked the benchmark description (https://huggingface.co/datasets/openai/mrcr), here it is:

"The task is as follows: The model is given a long, multi-turn, synthetically generated conversation between user and model where the user asks for a piece of writing about a topic, e.g. "write a poem about tapirs" or "write a blog post about rocks". Hidden in this conversation are 2, 4, or 8 identical asks, and the model is ultimately prompted to return the i-th instance of one of those asks. For example, "Return the 2nd poem about tapirs".

As a side note, steering away from the literal matching crushes performance already at 8k+ tokens: https://arxiv.org/pdf/2502.05167, although the models in this paper are quite old (gpt-4o ish). Would be interesting to run the same benchmark on the newer models

Also, there is strong evidence that aggregating over long context is much more challenging than the "needle extraction task": https://arxiv.org/pdf/2505.08140

All in all, in my opinion, "context rot" is far from being solved


Damn, I am the product A-GAIN?

No it is not. Anything combustion related certainly isn't, as has been proven ad absurdum. All non BEV non combustion alternatives are, optimistically phrased, in their infancy. So yes, BEVs are the future for the next 20-40 years at a minimum.

Edit: clarity


Its not only the automotive lobby to be fair.

Lobbies also moved Germany out of solar panel production, batteries and lately heat pumps.

So yeah, the legacy industry lobbies are the problem but not exclusively the automaker ones.


This is a very well researched essay regarding the solar panel industry in China and Germany: https://www.youtube.com/watch?v=QoCoPmtNRJo I really recommend watching / reading sober assessments like this.

This is the strategic decision that was the last nail in the coffin for European battery cell manufacturing: https://www.reuters.com/article/business/bosch-shuns-battery...

It is a rational assessment of realities when it comes to high end production. Not every industrial environment can produce every kind of industry. At some point the costs are too high to overcome the difference.


> Lobbies also moved Germany out of solar panel production, batteries

Was it lobbies, or costs which aren't competitive with China?


Afaik, It was lobbies and conservative goverents that chose to put the question up to the "free market", completely disregarding the fact that the competing geographies where heavily subsidizing those industries.

My account has more comments than that and I share OC's opinion and experience. I've been daily driving /e/ since its FP3 era and lately the experience has been really well polished. Even things that had been "tricky" in the past, like Android auto integration, now work seamlessly.

same here, except for banking apps. the one i am supposed to use now doesn't work.

My banking app doesn't even work on the last 3 android phones I tried because it wants a very up-to-date OS which basically means non-Pixel phones more than 2 years old need not apply.

I had this problem and it turned out to be an upstream issue with MicroG which was eventually patched. If you have an error message you can search for existing issues on /e/OS' gitlab/forum.

Revolut stopped working for me for a while with the error that the bootloader wasn't recognised and rooted phones aren't supported. After about a month an OS update solved it.


As long as banking works with web browsers, I think the future looks good for this usage, but I could de-bank my phone and still have plenty of useful things to do with it.

the problem in my case is that the authentication for the banks website requires an app, and that app doesn't work. i am locked out of online banking for that bank because of it. (they also have a windows app that i could not yet get to work on linux/wine)

Find a new bank. Mine has another problem, so behind they don't support mfa beyond a symantec program I've not heard of before. I don't use it, but can't use my yubikeys either.

Ugh. I'm locked up with my bank because of my mortgage terms and being one day forced to install a spyware on my phone really scare me.

So FP3 is shipped with the /e/ or you need to install it manually?

Murena have been selling Fairphones preinstalled with /e/ OS. I think right now its the FP6 and FP5 they do.

https://murena.com

Its not only Fairphones. I'd say they've got an interesting mix of tech there. I think they even sold refurbished pixels flashed with /e/ for a while.


It didn't come installed and the FP3 isn't sold anymore so you'll have to install it. It was dead easy though, not anywhere near the complexity of installing Lineage back in the days (though that's gotten easier too).

/e/ on FP3 has been my daily driver for 1.5 years and my daughters run it on their FP4s with no problems at all.


> AI is great at allowing you not to write the dumb boiler plate we all could crank...

I've actually started having a different view on this. After getting over the "glancing instead of reading llm suggestions" phase I started noticing that even for simple or boilerplate tasks, LLMs all too often produce quite wasteful results regardless the setting or your subscription. They are OK to get you going but in the last weeks I haven't accepted one Claude, devstral or gpt suggestion verbatim. Nevertheless, I often throw them boilerplate tasks even though I now know that typically I'll end up coding 6 out of 10 myself and only use the other four as skeletons. But just seeing the "naive" or "generic" implementation and deciding I don't like it is a plus as it seems to compress the time of thinking about it by a good part.


Indeed. And not only do they focus on this but the execution is just so beautifully spot on.

As others have mentioned it might even get too spot on on occasion.


I don't see how it qualifies as a legitimate download or ownership. You cannot save the file to a disk you control and you have no way to ensure you have continued access to it. Apple or the IP holder can cause this "download" to dissapear from your device/account without prior warning. Its actually written in the terms.


Is a monthly subscription better? Few are willing to buy and rip physical media.


That's a problem for the "many" then to sort out. Everyone else that wants to own their media just uses Automatic Ripping Machine.


Interesting how Microsoft came up twice about ai slop documentation today.

https://news.ycombinator.com/item?id=47067759


> Good Cirquits

Afaic, people designing circuits still do care about that.

> Good Assembly

The thing with the current state of coding is that we are not replacing "Coding Java" with something else. We are replacing it with "Coding Java via discussion". And that can be fine at times but it still is a game of diminishing returns. LLMs still make surprising mistakes, they too often forget specifics, make naive assumptions and happily follow along local minima. All of the above lead to inflated codebases in the long run which leads to bogged down projects and detached devs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: