More

mhitza · 2026-04-20T14:09:10 1776694150

Featureatis. Just keep pumping out features with no thought. Today, probably also AI-coded .

Even in mid-sized projects if you keep pushing for only new features you'll get a similar system. At least my experience in 3 or so midsized projects that I've worked on where nothing else mattered than checking of features from a huge backlog.

jamesfinlayson · 2026-04-21T03:42:20 1776742940

Ah, been at a company like that once before. After a while a dedicated team was created to go in and fix broader issues and essentially stop the system from collapsing under its own weight.

mhitza · 2026-04-16T14:14:03 1776348843

It's a MoE model and the A3B stands for 3 Billion active parameters, like the recent Gemma 4.

You can try to offload the experts on CPU with llama.cpp (--cpu-moe) and that should give you quite the extra context space, at a lower token generation speed.

abhikul0 · 2026-04-16T14:23:59 1776349439

Mac has unified memory, so 36GB is 36GB for everything- gpu,cpu.

zozbot234 · 2026-04-16T14:39:37 1776350377

CPU-MoE still helps with mmap. Should not overly hurt token-gen speed on the Mac since the CPU has access to most (though not all) of the unified memory bandwidth, which is the bottleneck.

abhikul0 · 2026-04-16T15:09:32 1776352172

I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.

zozbot234 · 2026-04-16T15:14:31 1776352471

Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.

mhitza · 2026-04-16T14:32:02 1776349922

For sure I was running on autopilot with that reply. Though in Q4 I would expect it to fit, as 24B-A4B Gemma model without CPU offloading got up to 18GB of VRAM usage

dgb23 · 2026-04-16T14:17:37 1776349057

Do I expect the same memory footprint from an N active parameters as from simply N total parameters?

daemonologist · 2026-04-16T14:35:53 1776350153

No - this model has the weights memory footprint of a 35B model (you do save a little bit on the KV cache, which will be smaller than the total size suggests). The lower number of active parameters gives you faster inference, including lower memory bandwidth utilization, which makes it viable to offload the weights for the experts onto slower memory. On a Mac, with unified memory, this doesn't really help you. (Unless you want to offload to nonvolatile storage, but it would still be painfully slow.)

All that said you could probably squeeze it onto a 36GB Mac. A lot of people run this size model on 24GB GPUs, at 4-5 bits per weight quantization and maybe with reduced context size.

pdyc · 2026-04-16T14:18:43 1776349123

i dont get it, mac has unified memory how would offloading experts to cpu help?

bee_rider · 2026-04-16T14:22:48 1776349368

I bet the poster just didn’t remember that important detail about Macs, it is kind of unusual from a normal computer point of view.

I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?

abhikul0 · 2026-04-16T14:39:38 1776350378

Of course the swap is there for fallback but I hate using it lol as I don't want to degrade SSD longevity.

mhitza · 2026-04-15T08:26:58 1776241618

Extra problems with the copyright industry for no benefit.

Hope the owner's OpSec was good enough and we won't hear about their unmasking.

Cider9986 · 2026-04-15T14:32:24 1776263544

They have a 500k[1] reward for finding OPSEC failures, so I think they have the basics down.

[1]https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

HDThoreaun · 2026-04-15T17:49:10 1776275350

No way Anna’s archive has $500k

Cider9986 · 2026-04-15T18:13:56 1776276836

Why not? Are they going to scam the person who completes the Google Books bounty for 200k?

fc417fc802 · 2026-04-15T08:36:37 1776242197

Extra? I thought they were clearly violating IP law to begin with. Unless I misunderstand this is "water is wet" territory (both the judgment as well as what Anna's Archive did).

mhitza · 2026-04-15T08:46:19 1776242779

Extra, because with the piracy of music they bought into equation members of (and implicitly) the recording industry https://en.wikipedia.org/wiki/Recording_Industry_Association...

ndsipa_pomu · 2026-04-15T13:30:01 1776259801

Water isn't wet, but it does "wet" other things. Wetness is the degree to which a liquid contacts and adheres to a solid surface, so it's makes no sense to say that water is wet.

shevy-java · 2026-04-15T08:53:24 1776243204

I do not see any law being violated by Anna's Archive in the slightest.

gertop · 2026-04-15T17:53:22 1776275602

Just because you disagree with a law doesn't mean that it doesn't exist. You anti copyright shills are exhausting... Why can't you try to attract people to your side to eventually instead effect some real change? Do you just take that much pleasure in being an edgelord that your cause be damned?

bulbar · 2026-04-15T09:16:31 1776244591

Just use it to train / tune a LLM. Apparently, everything becomes legal if you only put the stuff into the right kind of software.

That's at least what many people like to argue here on HN.

Cider9986 · 2026-04-15T14:33:32 1776263612

Anna's wants[1] companies to train on their data.

[1] https://annas-archive.gl/blog/ai-copyright.html

bulbar · 2026-04-16T04:51:41 1776315101

Thanks a lot, that's an interesting read and they make an interesting case.

I would have thought all big AI companies used Anna's Archive, but apparently only some of the US based companies used them.

lifecodes · 2026-04-15T08:35:14 1776242114

hmm you are right, I too wish the same brother

mhitza · 2026-04-14T11:31:15 1776166275

Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.

mhitza · 2026-04-13T19:22:30 1776108150

What EU country are you from? For me there where mostly upsides of being in the EU. Free travel, better consumer legislation, more invidual rights and protections, etc.

mhitza · 2026-04-13T19:18:44 1776107924

I wouldn't put a date in predictions, but wuthout right to veto they're playing harder into the nationalistic propaganda of "Brussel forces us"

busterarm · 2026-04-13T19:24:48 1776108288

https://michalovadek.github.io/eu-veto-tracker/. It's not just the nationalistic usual suspects that use their veto power.

This rightly points out that many issues that are known will have their veto used don't even get brought up. Removal of the veto will stop this and I expect lightning rod topics and disputes to occur much more frequently.

Same with the free-riding comment. Removing the veto will expose some nations "true colors" in ways that most do not anticipate. It's not all sunshine and rainbows of agreement among the EU member states.

bigbadfeline · 2026-04-14T01:00:14 1776128414

> many issues that are known will have their veto used don't even get brought up.

It's quite disingenuous to blame the veto power for lack of discussion on important issues, if anything it's an argument in favor of the veto, because the only reason to avoid discussion when you lack coercive power is weak arguments... and there's no need to waste time with such nonsense.

> Removing the veto will expose some nations "true colors" in ways that most do not anticipate.

Another slippery argument - there is absolutely no reason to hide the "true colors" of veto-capable members you disagree with, actually the opposite is true, one will have to come up with more, more convincing and true-color-exposing arguments in order to apply pressure via the electorate of the true colors.

> It's not all sunshine and rainbows of agreement among the EU member states.

No it's not, there are shady forces who dream about coercion for the worse.

busterarm · 2026-04-14T02:03:40 1776132220

This is underpants-gnomes-thinking. If the compelling arguments were there and they were politically tenable, they would be voiced already.

Nobody is keeping obvious policy programs in their back pocket. Especially when politicians are chasing clips.

mhitza · 2026-04-13T08:13:11 1776067991

> The finding I did not expect: model quality matters more than token speed for agentic coding.

I'm really surprised how that was not obvious.

Also, instead of limiting context size to something like 32k, at the cost of ~halving token generation speed, you can offload MoE stuff to the CPU with --cpu-moe.

triceratops · 2026-04-13T13:55:09 1776088509

Why would token speed matter for anything other than getting work done faster? It's in the name - "speed".

dminik · 2026-04-13T19:21:40 1776108100

This would be true if the models were capable of always completing the tasks. But, since their failure rate is fairly high, going in a wrong direction for longer could mean that you take more time than a faster model, where you can spot it going wrong earlier.

dangoodmanUT · 2026-04-13T12:29:41 1776083381

Yeah, it’s like drinking coffee when being really tired. You’re still tired, just “faster”, it’s a weird sensation.

kingstnap · 2026-04-13T12:25:36 1776083136

It's even more strange how its not obvious to someone who uses codex extensively daily.

The rate limiting step is the LLM going down stupid rabbit holes or overthinking hard and getting decision paralysis.

The only time raw speed really matters is if you are trying to add many many lines of new code. But if you are doing that at token limiting rates you are going to be approaching the singularity of AI slop codebase in no time.

mhitza · 2026-04-13T07:56:45 1776067005

Automated decision making processing, such as banning, must be avoided under the GDPR. Those facing this issue should throw a complaint at their DPA. I'm sure Musk would love another series of fines in the EU.

mhitza · 2026-04-12T20:50:26 1776027026

Indeed very slop-feeling "whitepaper", might as well be written by chatgpt/claude because it has the tropes.

Multiple sections have expandable subsections for more details on proposals.

mhitza · 2026-04-12T20:47:24 1776026844

Reads like asking for a EU handout. It touches on some visible issues in the single market, but most of what I've seen is not warranted. Eg. minimum spending quotas for AI work/integration/research, using European models (basically today = use Mistral), or carving residency process exceptions for AI researchers.