Hacker Newsnew | past | comments | ask | show | jobs | submit | BoredomIsFun's commentslogin

> Chemical reactions are just math.

No, it is quantum mechanics. Physical world is not reducible to math, it has been long proven since early 20th century.


Qm is described by maths. You have fully deterministic qm interpretations, which are fully determined by maths

LFM models I've tried all seemed to be suffering from serious coherence issues. I found Gemmas the best at tasks requiring rock solid coherent output; even Qwen's not comparable.

I think context length is important to consider here.

I find Gemmas really good for a short conversation with maybe 3 or 4 exchanges of a few paragraphs each, which covers a surprisingly large amount of interactions.

For anything longer form though, particularly with larger code contexts, Qwen is far more useful for me personally.

I'm not an expert in this field, but my understanding is Qwen are hybrid gated attention mechanisms, whereas Gemma is hybrid including a sliding attention attention mechanism which makes it look like it favour the most recent tokens a little too much at times.

This is all in the context of local quantized models, I'm aware both have larger cloud variants that wouldn't suffer as much.


> Qwen 3.6 burns it to the ground.

Not for creative writing or NLP.


It feels like a pointless conversation, if no sampler settings (min_p, temperature etc.) mentioned.


> An LLM is a router and completely stateless aside from the context you feed into it.

Not the latest SSM and hybrid attention ones.


Stateless router to router with lossy scratchpad is a step up, still not going to ask it to check my Lisp. That's what linters are for


good old illustrtation: https://www.ml6.eu/en/blog/large-language-models-to-fine-tun...

The it- one is the yellow smiling dot, the pt- is the rightmost monster head.


> If I offend anyone I will not be apologising for it.

What you said is simply counterfactual, so no reason to be offended.


Asimov is a widespread lastname in ex-USSR, esp. Central Asia. I personally know three unrelated Asimovs.


> Local model enthusiasts often assume that running locally is more energy efficient than running in a data center,

It is a well known 101 truism in /r/Localllama that local is rarely cheaper, unless run batched - then it is massively, 10x cheaper indeed.

> I think they mean that the DeepSeek API charges are less than it would cost for the electricity to run a local model.

Because it is hosted in China, where energy is cheap. In ex-USSR where I live it is inexpensive too, and keeping in mind that whole winter I had to use small space heater, due to inadequacy of my central heating, using local came out as 100% free.


Hmm...no. These two things are orthogonal. Regardless, Olmo are opensource.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: