Hacker Newsnew | past | comments | ask | show | jobs | submit | mhitza's commentslogin

Featureatis. Just keep pumping out features with no thought. Today, probably also AI-coded .

Even in mid-sized projects if you keep pushing for only new features you'll get a similar system. At least my experience in 3 or so midsized projects that I've worked on where nothing else mattered than checking of features from a huge backlog.


Ah, been at a company like that once before. After a while a dedicated team was created to go in and fix broader issues and essentially stop the system from collapsing under its own weight.

It's a MoE model and the A3B stands for 3 Billion active parameters, like the recent Gemma 4.

You can try to offload the experts on CPU with llama.cpp (--cpu-moe) and that should give you quite the extra context space, at a lower token generation speed.


Mac has unified memory, so 36GB is 36GB for everything- gpu,cpu.

CPU-MoE still helps with mmap. Should not overly hurt token-gen speed on the Mac since the CPU has access to most (though not all) of the unified memory bandwidth, which is the bottleneck.

I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.

Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.

For sure I was running on autopilot with that reply. Though in Q4 I would expect it to fit, as 24B-A4B Gemma model without CPU offloading got up to 18GB of VRAM usage

Do I expect the same memory footprint from an N active parameters as from simply N total parameters?

No - this model has the weights memory footprint of a 35B model (you do save a little bit on the KV cache, which will be smaller than the total size suggests). The lower number of active parameters gives you faster inference, including lower memory bandwidth utilization, which makes it viable to offload the weights for the experts onto slower memory. On a Mac, with unified memory, this doesn't really help you. (Unless you want to offload to nonvolatile storage, but it would still be painfully slow.)

All that said you could probably squeeze it onto a 36GB Mac. A lot of people run this size model on 24GB GPUs, at 4-5 bits per weight quantization and maybe with reduced context size.


i dont get it, mac has unified memory how would offloading experts to cpu help?

I bet the poster just didn’t remember that important detail about Macs, it is kind of unusual from a normal computer point of view.

I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?


Of course the swap is there for fallback but I hate using it lol as I don't want to degrade SSD longevity.

Extra problems with the copyright industry for no benefit.

Hope the owner's OpSec was good enough and we won't hear about their unmasking.


They have a 500k[1] reward for finding OPSEC failures, so I think they have the basics down.

[1]https://software.annas-archive.gl/AnnaArchivist/annas-archiv...


No way Anna’s archive has $500k

Why not? Are they going to scam the person who completes the Google Books bounty for 200k?

Extra? I thought they were clearly violating IP law to begin with. Unless I misunderstand this is "water is wet" territory (both the judgment as well as what Anna's Archive did).

Extra, because with the piracy of music they bought into equation members of (and implicitly) the recording industry https://en.wikipedia.org/wiki/Recording_Industry_Association...

Water isn't wet, but it does "wet" other things. Wetness is the degree to which a liquid contacts and adheres to a solid surface, so it's makes no sense to say that water is wet.

I do not see any law being violated by Anna's Archive in the slightest.

Just because you disagree with a law doesn't mean that it doesn't exist. You anti copyright shills are exhausting... Why can't you try to attract people to your side to eventually instead effect some real change? Do you just take that much pleasure in being an edgelord that your cause be damned?

Just use it to train / tune a LLM. Apparently, everything becomes legal if you only put the stuff into the right kind of software.

That's at least what many people like to argue here on HN.


Anna's wants[1] companies to train on their data.

[1] https://annas-archive.gl/blog/ai-copyright.html


Thanks a lot, that's an interesting read and they make an interesting case.

I would have thought all big AI companies used Anna's Archive, but apparently only some of the US based companies used them.


hmm you are right, I too wish the same brother

Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.

What EU country are you from? For me there where mostly upsides of being in the EU. Free travel, better consumer legislation, more invidual rights and protections, etc.

I wouldn't put a date in predictions, but wuthout right to veto they're playing harder into the nationalistic propaganda of "Brussel forces us"

https://michalovadek.github.io/eu-veto-tracker/. It's not just the nationalistic usual suspects that use their veto power.

This rightly points out that many issues that are known will have their veto used don't even get brought up. Removal of the veto will stop this and I expect lightning rod topics and disputes to occur much more frequently.

Same with the free-riding comment. Removing the veto will expose some nations "true colors" in ways that most do not anticipate. It's not all sunshine and rainbows of agreement among the EU member states.


> many issues that are known will have their veto used don't even get brought up.

It's quite disingenuous to blame the veto power for lack of discussion on important issues, if anything it's an argument in favor of the veto, because the only reason to avoid discussion when you lack coercive power is weak arguments... and there's no need to waste time with such nonsense.

> Removing the veto will expose some nations "true colors" in ways that most do not anticipate.

Another slippery argument - there is absolutely no reason to hide the "true colors" of veto-capable members you disagree with, actually the opposite is true, one will have to come up with more, more convincing and true-color-exposing arguments in order to apply pressure via the electorate of the true colors.

> It's not all sunshine and rainbows of agreement among the EU member states.

No it's not, there are shady forces who dream about coercion for the worse.


This is underpants-gnomes-thinking. If the compelling arguments were there and they were politically tenable, they would be voiced already.

Nobody is keeping obvious policy programs in their back pocket. Especially when politicians are chasing clips.


> The finding I did not expect: model quality matters more than token speed for agentic coding.

I'm really surprised how that was not obvious.

Also, instead of limiting context size to something like 32k, at the cost of ~halving token generation speed, you can offload MoE stuff to the CPU with --cpu-moe.


Why would token speed matter for anything other than getting work done faster? It's in the name - "speed".

This would be true if the models were capable of always completing the tasks. But, since their failure rate is fairly high, going in a wrong direction for longer could mean that you take more time than a faster model, where you can spot it going wrong earlier.

Yeah, it’s like drinking coffee when being really tired. You’re still tired, just “faster”, it’s a weird sensation.

It's even more strange how its not obvious to someone who uses codex extensively daily.

The rate limiting step is the LLM going down stupid rabbit holes or overthinking hard and getting decision paralysis.

The only time raw speed really matters is if you are trying to add many many lines of new code. But if you are doing that at token limiting rates you are going to be approaching the singularity of AI slop codebase in no time.


Automated decision making processing, such as banning, must be avoided under the GDPR. Those facing this issue should throw a complaint at their DPA. I'm sure Musk would love another series of fines in the EU.

Indeed very slop-feeling "whitepaper", might as well be written by chatgpt/claude because it has the tropes.

Multiple sections have expandable subsections for more details on proposals.


Reads like asking for a EU handout. It touches on some visible issues in the single market, but most of what I've seen is not warranted. Eg. minimum spending quotas for AI work/integration/research, using European models (basically today = use Mistral), or carving residency process exceptions for AI researchers.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: