Hacker Newsnew | past | comments | ask | show | jobs | submit | zoba's commentslogin

I actually don't know! Never had a voyage 200. I'd love to hear your experiences with it.

For TI-89, I recently updated the FAT engine to have height mappings. You can read more about it here: https://github.com/dzoba/ti-89-raycasting-with-z

I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.

Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.


I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.

They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.

Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.


+1 to this, anecdotally I’ve found in my own evaluations that if your system prompt doesn’t explicitly declare how to invoke a tool and e.g. describe what each tool does, most models I’ve tried fail to call tools or will try to call them but not necessarily use the right format. With the right prompt meanwhile, even weak models shoot up in eval accuracy.


> [...] _but not necessarily use the right format._

This has also been my experience. But isn't the harness sending the instructions on how to invoke a tool? Maybe it is missing the formatting part. What do you think?


Have frontier lab do the plan which is the most time consuming part anyways and then local llm do the implementation. Frontier model can orchestrate your tickets, write a plan for them and dispatch local llm agents to implement at about 180 tokens/s, vllm can probably ,manage something like 25 concurrent sessions on RTX 6000 Do it all in a worktrees and then have frontier model do the review and merge. I am just a retired hobbyist but that's my approach, I run everything through gitea issues, each issue gets launched by orchestrator in a new tmux window and two main agents (implementer and reviewer get their own panes so I can see what's going on). I think claude code now has this aspect also somewhat streamlined but I have seen no need to change up my approach yet since I am just a retired hobbyist tinkering on my personal projects. Also right now I just use claude code subagents but have been thinking of trying to replace them with some of these Qwen 3.5 models because they do seem cpable and I have the hardware to run them.


What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.


In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.


Will this be called Web 4.0?


There was never a 3.0...



There's no 3.0 in ba sing se


“Web 3” was crypto


It was originally the eternally-on-the-horizon Semantic Web, before somebody decided to reuse the name into something to do with crypto (perhaps without bothering to search for "web 3" beforehand)



Had great success with this prompt: “QA this website for me. Report all bugs”


iPhones are now so big we need a special carrying device for it.

Please bring back the mini :’(


I’d also be interested in this. Especially for Macs


I’m very excited for this. An early question I have: what would need to be done to make this a “thinking” model?


Thinking about what Jony Ive said about “owning the unintended consequence” of making screens ubiquitous, and how a voice controlled, completely integrated service could be that new computing paradigm Sam was talking about when he said “ You don’t get a new computing paradigm very often. There have been like only two in the last 50 years. … Let yourself be happy and surprised. It really is worth the wait.”

I suspect we’ll see stronger voice support, and deeper app integrations in the future. This is OpenAI dipping their toe in the water of the integrations part of the future Sam and Jony are imagining.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: