Hacker Newsnew | past | comments | ask | show | jobs | submit | zkmon's commentslogin

I have tested Gemma4-26B against Qwen3.6-35B. Gemma beats Qwen on structured data extraction and instruction following. Gemma is far more precise than Qwen in these tasks, while Qwen gets a bit more creative, verbose, and imprecise. However Qwen has far more general smartness, high token throughput. Qwen could precisely pinpoint the issues in data quality and code, while Gemma had no clue. On the coding skills, Qwen appears to have edge over Gemma, but this could depend on the agent you use. For direct chat (llama_cpp UI), bot models show same skills for coding.

That's interesting. I've been using Qwen3.5-35B for (poorly) structured table extraction based largely on the reports that Qwen had a much better vision implementation.

I have not benchmarked Qwen3.5 vs. Qwen3.6 for the same task, nor trialed Gemma4-26B. Guess it's time for some testing!


Github, Java, Python, Whatsapp, Gmail, SWIFT, DNS, Cloud infra, Appstore, Playstore - all can become tools in the hands of powers.

SVN (subversion) was working excellent for my team, about 20 yrs back. I never saw sufficient justification for the complexity brought in by Git.

But as I say, New tech invades the world and makes the perfectly working old tech as incompatible, just by changing the world around it. So git became a necessity imposed.


There was a big hype around it (you probably remember it), and around distributed version control, if you weren't using a DVCS you were suddenly seen as an inferior computer programmer and hence your employment opportunities were diminishing. That perspective when it comes to almost all-things programming-related has only accelerated ("if you're not quick to adapt to agentic AI you will lose your cushy job!"), with the recent AI craze the latest example of that.

Though everyone calls a solid blue color as blue, the actual visual perception or experience of that color could be entirely different. They just grew up calling that experience of their own, as blue color.

> Selling the solution to the problem you caused ought to be illegal.

Most tech solutions are built on the problems they created. This includes phones, cars, computers, every software upgrade, and almost every electronic gadget. You are forced to use them because the world around you is no longer compatible with the way of life that was before the introduction of these tech.


I probably agree with you but what on earth are phones and cars doing in this list? They solve obvious physical problems not caused by a company.

My interpretation would be that cars are necessary to live in places where urban design assumes that we'll use cars to get around. Many cities are designed this way.

Similarly, phones are required now for some activities, like online banking. First it was an option, then it became the norm.


Exactly.

General Motors contributed significantly to the decline of passenger rail in the USA.

See https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...


I’ve spent some time thinking about this. I’ve basically decided this is nonsense. Please give me the problem that the following both created and solved: phones, cars, computers.

In my view the problems of “communication over distances”, “quickly traveling over distances” and “having information to process” all existed as problems well before we invented these solutions.

So that leaves only the idea that we’re forced to use these tool because the world has changed. That is frequently true but hinges on the amount of effort an individual is willing to make - and their economic prowess in an economy that rewards groupthink.

I stand by my original statement: selling solutions to problems you created ought to be illegal.


The biggest rule-break was done, not by the agent or infra company, but by the person who gave such elevated authorization (API key) to an autonomous bot.

Isn’t the biggest rule to have working backups with 3-2-1 strategy?

That's not what happened.

if an api key with full perms was put in a place where the agent can access it, that is the biggest problem.

that somebody made a key thst can delete prod when they dont need to delete prod is the underlying problem with that

and underlying that still is that the staging environments were on the same account as prod.


You’re very defensive in these comments - are you the author?

Yesterday was a realization point for me. I gave a simple extraction task to Claude code with a local LLM and it "whirred" and "purred" for 10 minutes. Then I submitted the same data and prompt directly to model via llama_cpp chat UI and the model single-shotted it in under a minute. So obviously something wrong with coding agent or the way it is talking to LLM.

Now I'm looking for an extremely simple open-source coding agent. Nanocoder doesn't seem install on my Mac and it brings node-modules bloat, so no. Opencode seems not quite open-source. For now, I'm doing the work of coding agent and using llama_cpp web UI. Chugging it along fine.


https://pi.dev/ seems popular, whats not open source about opencode? The repo has an MIT License.

Been LOVING Pi so far!

Some people believe only copyleft licenses are open source. They're right on principle, wrong in (legal) practice.

They're not even right on principle: https://www.gnu.org/licenses/license-list.html

Even the FSF recognizes that non-copyleft licenses still follow the Freedoms, and therefore are still Free Software.


+1 for pi. I used claude and opencode but pi is the first agent tool that made me excited about the whole thing.

Maybe it's just my feeling. It asks to update/upgrade continuously.

It's completely open source, but is under heavy continual development (likely a lot of AI coding).

On launch, it checks for updates and autoupdates.


It doesn't auto update. Maybe you have an extension?


Probably a silly idea, but I'll throw it into the mix - have your current AI build one for you. You can have exactly the coding agent you want, especially if you're looking for "extremely simple".

I got annoyed enough with Anthropic's weird behavior this week to actually try this, and got something workable up & running in a few days. My case was unique: there's no Claude Code for BeOS, or my older / ancient Macs, so it was easier to bootstrap & stitch something together if I really wanted an agentic coding agent on those platforms. You'll learn a lot about how models actually work in the process too, and how much crazy ridiculous bandaid patching is happening Claude Code. Though you might also appreciate some of the difficulties that the agent / harnesses have to solve too. (And to be clear, I'm still using CC when I'm on a platform that supports it.)

As for the llama_cpp vs Claude Code delays - I've run into that too. My theory is API is prioritized over Claude Code subscription traffic. API certainly feels way faster. But you're also paying significantly more.


Just in case it didn't occur to you already, you can just build whatever coding agent you want. They're pretty simple

Swival is not bloated and was specifically made for local agents: https://swival.dev

pi.dev as well

I use both Cursor and Claude Code, and yes, the latter is noticeably slower with the same model at the same settings.

However, it's hard to justify Cursor's cost. My bill was $1,500/mo at one point, which is what encouraged me to give CC a try.


You'd figure by now we would have something between a TUI and an IDE.

You can run CC with local models, it's pretty straightforward. I've done this with vLLM + a thin shim to change the endpoint syntax.

what model you used with llama_cpp?

Qwen3.6-35B quant-4 gguf

They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.

Kimi K2.5 and K2.6 are both >1T

Bots would win over all anti-spam, anti-slop measures. All blog posts and comments everywhere would be filled with spam and slop. That's when humanity turns it head away from screens, back towards other humans nearby and start talking to each other, while the ocean of slop and spam keep bubbling, infested bots.

On llama server, the Q4_K_M is giving about 91k context on 24GB, which calculates to about 70MB per 1K context (KV-Cache). I could have gone for Q5 which probably leaves about 30K token space. I think this is pretty impressive.

I have been getting good results with IQ4_NL and TurboQuant at 4bits on 24gb (3090). It easily fits 256k with that setup, but it starts slowing down quite a bit after 80-100k. Quality in my testing is also still good:

- Coding task test: https://github.com/sleepyeldrazi/llm_programming_tests/ - Design task test: https://github.com/sleepyeldrazi/llm-design-showcase

Coding was against minimax-m2.7 and glm-5, and the design against other small models


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: