Hacker Newsnew | past | comments | ask | show | jobs | submit | zozbot234's commentslogin

I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.

What's "sad" is how slow the ollama folks are being in vendoring newer versions of ggml into their codebase. That attitude just leaves them stranded without access to newer features.

> But I think the key is that in the standard autoregressive case we get memory bandwidth bound, so there are tons of idle compute resources.

Right, this is the same way batching works. It's "free" until we exhaust available compute resources, at which point decode throughput becomes compute bound. (This is a good place to be, because scaling out compute is a lot easier than adding fast VRAM.) This is why MTP is mostly useful when you have one or few users, which means compute is abundant. When you're running large batches you're better off using that compute to grow your batch size.

Of course, batch size is usually limited by things like bulky KV caches. So perhaps MTP has some residual use in that setting. But if you're sharing cached context in a subagent swarm, or running a model like the recent DeepSeek V4 with its tiny KV cache, you can go a lot further in processing a larger batch.


You can disaggregate though. So draft models can run on cheaper hardware with less RAM, saving time on the more expensive machines with more RAM.

I think it also gets use in the /fast modes the providers sell at higher cost.

They probably use it on all models. Fast is probably just a resource pool with less congestion and therefore faster throughput per user but less efficent.

Looks like this was not an open release, the latest GLM-xV release was 4.6V and Turbo models were never open.

> In an agentic world, the OS needs to be completely rethought. For example, every single app functionality should be exposable via an API while remaining human friendly.

So, like a Unix system?


> I don’t understand why Rust even has panics if its primary goal is safety. We should be able to prove that the code has no paths that may panic ever. I’ve been looking at this all week. It’s very difficult to make a program that is guaranteed not to panic.

The Rust-in-Linux folks are working on this with things like failable memory operations. It's required for their own use. Increased use of proof (such as proving that an array is non-empty) is also being slowly worked on.


> there was zero indication this was an experiment

  The goal of Phase A is a **draft** `.rs` next to the `.zig`
  that captures the logic faithfully — it does **not** need to compile. Phase B
  makes it compile crate-by-crate.
I mean, it would be hard to spell it out any clearer than that! Code that fails to compile is just not very useful for real work.

Phase B clearly says compilation is the next goal. The first goal is to get a like for like logic, the second goal is to get it to compile. Can you guess what the third goal will be? Throw out the code?

The branch is named phase-a-port and the document explains what "phase-a" means. It's quite clear.

Yes, but that would require people to read past the title. You can't get a proper knee-jerk first post in if you do that! Completely unfair to expect people to make that sacrifice/effort.

[there was some sarcasm there, BTW, if anyone has a faulty detector that didn't pick up on it]


"Writing code" as a task of its own is called cowboy coding. It's neat that AI can do this now, but that has nothing to do with proper software engineering which always starts from a careful, human-led design.

"has nothing to do with proper software engineering"

So you're saying software engineers don't write code? Just because there are other things that SWEs do, does not mean it has nothing to do with it.

It's arguably a pretty important part. Would you really hire a software engineer who can't code?


Writing code and copying the output of an LLM is absolutely not the same.

You wouldn't call someone an author that takes LLM outputs and shoves it in a book. IDK why this distinction doesn't apply to devs too.


You call someone an author when they use a ghostwriter. They're giving inputs that are core to the output, even though they aren't doing all the writing. Same thing.

I can assure you a sizable amount of people in the writing community look down on "authors" that only use ghostwriters.

Why do tech workers act shock that people hate this junk being force fed to them that they are now resorting to violence to reject said junk?

You think telling humans with specialized crafts that they don't matter is good politics? Good grief.


Of course.

I'm not surprised at all that devs are upset.

>You think telling humans with specialized crafts that they don't matter is good politics? Good grief.

Yeah, of course not. There are lot's of historical examples of this. That being said, those historical examples don't play out well for the craftsmen, either.

Look, I'm a SWE myself. I see my job drastically changing right in front of my eyes. I know there's nuance to it, too, that's hard to articulate in these comment threads.

But I think a lot of people here are biased against thinking that they are irreplaceable - I've definitely been in that camp. I don't think that it's wise, however.


>You call someone an author when they use a ghostwriter.

i don't know about you, but i absolutely don't. either you write the book yourself or you are not the author.

as kendrick lamar wrote:

I can dig rappin', but a rapper with a ghostwriter?

What the fuck happened? (Oh no)


Or even more appropriate: a movie director is almost never on-screen but the actors aren't the ones determining the shots to use or writing the script.

What's a good example of human-led design?

Yes and every AI-first development workflow worth its salt does exactly this, and it does it much more thoroughly than I’ve ever seen a team of meatbags do it.

My workflow, at a high level, is:

1. I write a high level spec. Not as high level as a single-sentence prompt, but high level enough to capture my top requirements.

2. I prompt the AI to interview me about the spec to clear up any ambiguity or open questions, then when I’m satisfied, the AI writes a longer spec, which I then review.

3. Then I prompt the AI to write an implementation plan based on the spec. I might just skim this, and by this point I might be asking the LLM more questions than it’s asking me.

4. Now I hand it off to the implementer agent.

This isn’t cowboy coding, it’s not even agile. It’s waterfall. The problem with doing waterfall was that it’s too slow, especially with the deserialization/serialization cost of routing all of this documentation through meatbrains. The LLM is doing just as much work, true, but faster.

The thing I found surprising was that, while LLM’s are still pretty awful at writing as an art form, they are better technical writers than I have the time to be, especially when writing for an audience of other LLM’s.


Is this project in production and for how long? How many users?

> Nearly every time I call a function I don't want to have to care if it is synchronous or not.

The problem is that "nearly every time" bit. There's times where you are looking at the code and you absolutely want to be aware of where the function is suspending. Similar to the use of ? in error handling to surface all failable operations that might do an abnormal return.


A manager doesn't have to look at the code that's being shipped. An IC will still need to do that, and this will eventually take up much of their work. It can be addressed by moving up the stack to higher level and more strictly checked languages, where there's overall less stuff to review manually.

People typically think it's not a new person's fault if they come in to a team and bring down production.

That's a failure of the existing infrastructure to allow someone to do this.

LLM coding will work like this.

If you're letting LLMs go wild with no system in place to automatically know they're moving in the right direction and "shipping" things up to your standards, the failure is you, not the LLM.


The dirty secret is all the people talking about shipping 4 features a day etc are just lying about reviewing anything. They don’t review it at all.

I didn't say shipping a day. I said shipping at the same time.

The review comes at the end, though I truly believe this will go away as well. Agents will also get better at review until they're good enough that no one will want to do it anyways. Good enough is good enough.


I review more thoroughly and faster with Claude than without.

Claude absolutely improves code review quality, but it still misses a lot. It's a second pair of eyes, it doesn't replace/remove the work you have to put in to fully review the code yourself.

It's like saying that you code reviewed faster just because someone else also reviewed the code, that's not how it works.


Agree, and with CC my volume and quality of PR review has substantially increased since 4.5. Without CC for review we would have a ridiculous bottleneck in our dev/qa pipeline.

I'm faster, sure, but more thorough, no. The same, because I was already very careful. But it's not a massive win either; 4.7 misses too much still because it would need to read too much of the context each time to understand the architectural problems I'm catching.

Its nice to not have to care about nits and other things that we don't have lints for though, so that's useful.


Spot on. When will the cretins understand, it's not about how much code you can generate.

Just like a manager, you don't need to look at the code. You need to set up quality systems to provide evidence the code does what it is supposed to do, just like a manager.

I’ve never met a manager that have setup “quality” systems to ensure that the job is done correctly. Their actions are always retroactive. And not pertaining to code at all. The overarching contract is “You do a bad job, you will be fired”.

Code review has a number of important purposes beyond merely verifying functionality. It's true that some managers don't recognize this, fail to allocate time for anything but feature work, and then wonder a few years later why the software is so buggy and new feature development is so hard.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: