LFM models I've tried all seemed to be suffering from serious coherence issues. I found Gemmas the best at tasks requiring rock solid coherent output; even Qwen's not comparable.
I think context length is important to consider here.
I find Gemmas really good for a short conversation with maybe 3 or 4 exchanges of a few paragraphs each, which covers a surprisingly large amount of interactions.
For anything longer form though, particularly with larger code contexts, Qwen is far more useful for me personally.
I'm not an expert in this field, but my understanding is Qwen are hybrid gated attention mechanisms, whereas Gemma is hybrid including a sliding attention attention mechanism which makes it look like it favour the most recent tokens a little too much at times.
This is all in the context of local quantized models, I'm aware both have larger cloud variants that wouldn't suffer as much.
> Local model enthusiasts often assume that running locally is more energy efficient than running in a data center,
It is a well known 101 truism in /r/Localllama that local is rarely cheaper, unless run batched - then it is massively, 10x cheaper indeed.
> I think they mean that the DeepSeek API charges are less than it would cost for the electricity to run a local model.
Because it is hosted in China, where energy is cheap. In ex-USSR where I live it is inexpensive too, and keeping in mind that whole winter I had to use small space heater, due to inadequacy of my central heating, using local came out as 100% free.
No, it is quantum mechanics. Physical world is not reducible to math, it has been long proven since early 20th century.
reply