And how is this comment relevant here? The abstract lists the digestible model names, and you can find the details in the supplementary text:
> To evaluate user-facing production LLMs, we studied four proprietary models: OpenAI’s GPT-5 and GPT- 4o (80), Google’s Gemini-1.5-Flash (81) and Anthropic’s Claude Sonnet 3.7 (82); and seven open-weight models: Meta’s Llama-3-8B-Instruct, Llama-4-Scout-17B-16E, and Llama-3.3-70B-Instruct-Turbo (83, 84); Mistral AI’s Mistral-7B-Instruct-v0.3 (85) and Mistral-Small-24B-Instruct-2501 (86); DeepSeek-V3 (87); and Qwen2.5-7B-Instruct-Turbo (88).
edit: It looks like OP attached the wrong link to the paper!
Nontechnical people simply don't have any idea about what LLMs are. Their only mental model comes from science fiction, plus the simple fact that we possess a theory of mind. It would be astonishing if people were able to casually not anthropomorphize LLMs, given that untold millions of years worth of evolution of the simian neocortex is trying to convince you that anything that talks like that must be another mind similar to yours.
Also, many many people suffer from low self esteem, and being showered with endorsement and affirmation by something that talks like an authority figure must be very addictive.
My parents ended up being forced by circumstances to move into a retirement home about five years ago. Fortunately, the place turned out to be run by people who mostly cared about their clients and so my parents' lives were basically OK, except that the food sucked (which AFAICT is par for the course at retirement homes). But a few months ago the place was acquired by a different company, which is trying to squeeze out higher profits. Staffing and services are being cut, and prices are going up. Even the food got worse, which I didn't think was even possible. The response when someone complains is, "If you don't like it you are free to leave."
Yeah, right. My barely mobile 90-year-old parents, one of whom has Parkinson's, are just going to pack up and go. They know perfectly well that they have a captive audience.
Thankfully, my mother died before the acquisition, and my father died last week, only a few months after the acquisition, so I don't have to deal with this any more. But caveat emptor: if you ever go into a retirement home, think about what will happen if they change ownership. Even if it looks great, or even acceptable, now, there is no guarantee that it will still be great, or even acceptable, tomorrow, unless you somehow manage to negotiate such a guarantee. I have no idea what a contract provision like that would even look like. But I am going to be facing this problem myself some day, so I'd love to hear ideas.
Incredibly small concession that doesn’t warrant this article’s absolutely insane framing: “Even less of a problem than we thought,” “very, very good news,” “already sounded perfectly manageable.”
The author is so giddy to defend this monopolistic restriction on Google’s part. Hackers can use F-Droid without annoyance, but this really does kill any chance at normies using it. They absolutely will use the worst spyware on Google Play instead, and the author seemingly loves it.
A pastime I have with papers like this is to look for the part in the paper where they say which models they tested. Very often, you find either A) it's a model from one or more years ago, only just being published now, or B) they don't even say which model they are using. Best I could find in this paper:
> We evaluated 11 user-facing production LLMs: four proprietary models from OpenAI, Anthropic, and Google; and seven open-weight models from Meta, Qwen, DeepSeek, and Mistral.
(and graphs include model _sizes_, but not versions, for open weight models only.)
I can't apprehend how including what model you are testing is not commonly understood to be a basic requirement.
One of the authors (of one of the two models, not this particular paper) here. Just a clarification, these models are *not* burned into silicon. They are trained with brutal QAT but are put onto fpgas. For axol1tl, the weights are burned in the sense that the weights are hard-wired in the fabric (i.e., shift-add instead of conventional read-muk-add cycle), but not on the raw silicon so the chip can be reprogrammed. Though, for projects like smartpixel or HG-Cal readout, there are similar ones targeting silicon (google something like "smartpixel cern", "HGCAL autoencoder" and you will find them), and I thought it was one of them when viewing the title.
Some slides with more info: https://indico.cern.ch/event/1496673/contributions/6637931/a...
The approval process for a full paper is quite lengthy in the collaboration, but a more comprehensive one is coming in the following months, if everything went smoothly.
Regarding the exact algorithm: there are a few versions of the models deployed. Before v4 (when this article was written), they are slides 9-10. The model was trained as a plain VAE that is essentially a small MLP. In inference time, the decoder was stripped and the mu^2 term from the KL div was used as the loss (contributions from terms containing sigma was found to be having negliable impact on signal efficiency). In v5 we added a VICREG block before that and used the reconstruction loss instead. Everything runs in =2 clock cycles at 40MHz clock. Since v5, hls4ml-da4ml flow (https://arxiv.org/abs/2512.01463, https://arxiv.org/abs/2507.04535) was used for putting the model on FPGAs.
For CICADA, the models was trained as a VAE again, but this time distilled with supervised loss on the anomaly score on a calibration dataset. Some slides: https://indico.global/event/8004/contributions/72149/attachm... (not up-to-date, but don't know if there other newer open ones). Both student and teacher was a conventional conv-dense models, can be found in slides 14-15.
Just sell some of my works for running qat (high-granularity quantization) and doing deployment (distributed arithmetic) of NNs in the context of such applications (i.e., FPGA deployment for <1us latency), if you are interested: https://arxiv.org/abs/2405.00645https://arxiv.org/abs/2507.04535
Decades of HN users finger wagging and suggesting FOSS hardware has progressed society nowhere. 12 months from EU mandatory replaceable batteries and products across the industry are being redesigned with repairability, usb-c, and user friendly designs.
It’s time to accept regulation actually does work when you have a competent government.
Voyager 1 & 2 is one of my favourite human science achievements, not even so much from technology standpoint, as it's relatively simple compared to what we have now (although that's one of the charms), but just the fact that it's so far away, it still more or less works long after the scheduled mission end time, we can communicate with it and despite all the modern technology progress, it would take decades to catch up. Absolutely amazing and inspiring!
>Here’s hoping governments regulate laptop manufacturers to actually make repairable machines in the future.
No, this is a bad solution. If you want a repairable machine, buy one. They exist. Others have already mentioned Framework, but there are other options that aren't that far down the spectrum either.
One of the things macbook users praise the most is "build quality", which often means the solidity of the device, lack of flex, etc. These quality features are, in part, achieved by the same choices that make it hard to repair. Ease of repair and "build quality", are to some degree (although not entirely) tradeoffs against each other.
I say this as a framework owner who would never buy something as irreparable as a macbook. Regulation is not the answer here.
I've always said this but AI will win a fields medal before being able to manage a McDonald's.
Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).
LLMs are discovering a lot of new math because they are great at low depth high breadth situations.
I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
The biggest sign something is broken is when someone writes: "Thankfully, my mother died before the acquisition, and my father died last week, only a few months after the acquisition, so I don't have to deal with this any more."
That's true, but it might not be as important here.
Spain is not a country with a Common Law legal system entirely like the US or the UK. They have a civil law system where prior court judgement does not form a strictly binding precedent. Prior judgements can be important, but case law is not really a thing.
The thruster fix is the part that gets me. They sent a command that would either revive thrusters dead since 2004 or cause a catastrophic explosion, then waited 46 hours for the round trip with zero ability to intervene. That's a production deployment with no rollback, no monitoring dashboard, and a 23-hour latency on your logs. They nailed it.
There's a crazy story in here where Sytse invested in a click chemistry cancer research startup (Shasqi) in 2017 and ends up becoming a customer six years later.
Laws intent are often clarified in courts through judgments. If you can overlay the judgements on top of the corresponding law, at correct points in time, I think that will have value. It might, for example, show which laws were referenced the most and which needed to be clarified the most. It might give insights into what legal language constructs stood the test of time and which had to be repeatedly clarified.
The price for wholesale electricity is set by a bidding process, with each generating company saying what it would be willing to accept to produce a unit of electricity.
Once built, the cost of generating power from renewables is very low, so these typically come in with the cheapest bid. Nuclear might come next.
Gas generators often have the highest costs, because they have to buy gas to burn, as well as paying a "carbon price" - a charge for emissions.
The wholesale cost is set by the last unit of electricity needed to meet demand from consumers. This means that even if gas only generates 1% of power at a given time, gas will still set the wholesale price.
"The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons.
We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views.
We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (https://openreview.net/forum?id=tO3AS
KZlok
).
We would greatly appreciate your attention and help in sharing it."
Not a dumb question. The shortest (and at a glance unsatisfactory) answer is because it works, and therefore it evolved that way.
Going in detail, first consider that for a feature to be evolutionarily selected for two things have to be true:
1. It must increase the fitness of the organism that carries it, i.e. the likelihood of its carrier having descendants as compared to non-carriers ( or be a side effect of another feature that improves fitness enough to be a net positive, etc etc )
2. It must be inheritable (and, in sexually reproduced organisms, mutually compatible during embryonic development).
One such a feature has reached dominance in a given population, as long as it continues to be important for fitness it cannot really be deprecated in favour of an alternative from scratch, even if that alternative is arguably better.
That's why, for instance, vertebrate ocular nerves connect to our retinas on the inside of our eyeball, resulting in us having a blind spot. Cephalopods, on the other hand, evolved their eyes independently the "reasonable" way, connecing their nerves from behind the eyeball. There's no way a vertebrate could mutate from scratch for its optical nerve to connect to the retina from behind without causing absolute mayhem in embryonic development. Our hacky solution for the blind spot? Let the brain hide it in software.
Going back to your question, some spots of the body being more sensitive than others became critical for evolutionary fitness long before nervous systems were complex enough to generate conscious qualia, let alone enough for them to be consistently involved in decision making. Furthermore, mapping of specific nerves to intensity of feeling on the CNS would imply complex hardcoding of something which is much easier to solve with "this place important, have more nerves", and maybe would even conflict with the fitness benefit of a CNS with enough neuroplasticity to learn anew during the development and lifetime of an organism.
So, in summary, the solution of having more nerves where it matters is simple, good enough, and has no reason to be rolled back in favour of a radically different alternative.
I'm on a electricity tariff where the per kWh unit price changes every 30 minutes, you're basically being charged at market rate or thereabouts, the prices for the next 28 hours are announced at 4pm every day.
Generally the prices betwen 4pm-7pm are expensive and the rest of the time it's cheaper - although with current world events things have gotten a little spicy lately.
On really windy days you definitely get to see the benefit where prices drop to zero or even negative, which is great if you have an EV or something to dump lots of power into. Looking at todays prices they're like 1-3p p/kWh!
But that doesn't last, as the wind dies things start to get back to normal.
The key with the tariff though is to just play the averages and generally avoid high power usage during the peak periods. My average for the last 2 years was around 30% cheaper (p/year) than what I would have paid if I was on a normal energy tariff.
It will be interesting to see whether that trend continues, especially with the state the world has suddenly been thrown into.
Solar is not less than revolution in Pakistan. Almost every home and factory has solar installed on their roofs. More affluent houses have almost gone off grid; others are selling back to grid and others who can't afford has their own small scale 12V solar panels to run fans in the scorching summer of Pakistan to save electricity bills. It is all done by people independently without much support from the government as ROI (if you are using full potential of your installed capacity, it can be as low as 1 year and afterwords it will be free) is much better on solar than paying the grid.
I myself has got one my roof, 6KW with 5Kwh battery backup costing me 700K roughly 2500$. Now, I can use AC without thinking of electricity bills and the most importantly I do not have to face inconvenience of grid being not available in some cases for 24 hours.
Now Pakistan is facing energy crises not because it does not have enough, because it has too much as people are generating their own and due to nature of the contracts with electricity producing companies' government has to pay them according to their installed capacity not by generated.
According to a government report in 2021, 116,816Gwh was consumed commercially and in 2024 it stands at 111,110Gwh and in 25 and 26 in would be even lower.
I'd feel obliged to add some "but, her emails..." reference.
But it feels million years away.
It's interesting to wonder how you get out of a spiral of incompetence and border-line (to be polite) corrumption at the highest level.
Putting those people in charge was quick ; sure, a future administration could put them out quickly enough ; but how long will there be decently skilled people willing to take those positions ? How long until the only ones who want to put their toes in the swamp are those who really enjoy the mud ?
Put differently: can a liberal democracy organize a "just" version of a purge ?
That's not the use case. The use case is running apps from a remote Linux host as a local window. A performant VNC for specific windows if you will.
For example, you could run VS Code on that machine as a window on your Mac. A more real world example is people accessing guis (e.g. matlab) on lab clusters.
The closest set up for x11 would be to use x11 forwarding with xpra.
I understand why OpenAI is trying to reduce its costs, but it simply isn't true that AI crawlers aren't creating very significant load, especially those crawlers that ignore robots.txt and hide their identities. This is direct financial damage and it's particularly hard on nonprofit sites that have been around a long time.
> If you have a public website, they are already stealing your work.
I have a public website, and web scrapers are stealing my work. I just stole this article, and you are stealing my comment. Thieves, thieves, and nothing but thieves!
Having over a decade of open source software I've written freely available online, I actually really appreciate the value that AI && LLMs have provided me.
The thing that leaves a bad taste in my mouth is the fact that my works were likely included in the training data and, if it doesn't violate my licenses (GNU 2/3), it certainly feels against the spirit of what I intended when distributing my works.
I was made redundant recently "due to AI" (questionable) and it feels like my works in some way contributed to my redundancy where my works contributed to the profits made by these AI megacorps while I am left a victim.
I wish I could be provided a dividend or royalty, however small, for my contribution to these LLMs but that will never happen.
I've been looking for a copy-left "source available" license that allows me to distribute code openly but has a clause that says "if you would like to use these sources to train an LLM, please contact me and we'll work something out". I haven't yet found that.
I'm guessing that such a license would not be enforceable because I am not in the US, but at least it would be nice to declare my intent and who knows what the future looks like.
>You think of something new and express it - through a prompt, through code, through a product - it enters the system. Your novel idea becomes training data. The sheer act of thinking outside the box makes the box bigger.
This was the same before, if you had a novel idea and make a product out of it others follow. Especially for LLMs, they are not (till now) learning on the fly. Claude Opus 4.6 knowledge cut off was August 2025, so every idea you type in after this date is in the training data but not available, so you only have to be fast enough. Especially LLMs/AI-Agents like Claude enable this speed you need for bringing out something new.
The next thing is that we also have open source and open weight models that everyone of use with a decent consumer GPU can fine-tune and adapt, so its not only in the hands of a few companies.
>We will again build and innovate in private, hide, not share knowledge, mistakes, ideas.
Why should this happen? The moment you make your idea public, anyone can build it. This leads to greater proliferation than before, when the artificial barrier of having to learn to code prevented people from getting what they wanted or what they wanted to create.
> To evaluate user-facing production LLMs, we studied four proprietary models: OpenAI’s GPT-5 and GPT- 4o (80), Google’s Gemini-1.5-Flash (81) and Anthropic’s Claude Sonnet 3.7 (82); and seven open-weight models: Meta’s Llama-3-8B-Instruct, Llama-4-Scout-17B-16E, and Llama-3.3-70B-Instruct-Turbo (83, 84); Mistral AI’s Mistral-7B-Instruct-v0.3 (85) and Mistral-Small-24B-Instruct-2501 (86); DeepSeek-V3 (87); and Qwen2.5-7B-Instruct-Turbo (88).
edit: It looks like OP attached the wrong link to the paper!
The article is about this Stanford study: https://www.science.org/doi/10.1126/science.aec8352
But the link in OP's post points to (what seems to be) a completely unrelated study.