Hacker Newsnew | past | comments | ask | show | jobs | submit | montroser's commentslogin

Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.


Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.

Right. You can also perform RSA encryption on pencil and paper with a scientific calculator. It works, but it's not useful throughput for serious work

Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.

I was about to ask that

It's not terrible for interactive... https://mikeveerman.github.io/tokenspeed/?rate=12&mode=text

And it should be just fine for plenty of background use cases.


This is my daily driver laptop. It's pretty good for what it is. Runs Linux perfectly, not trying to be especially too fast, very nice pixel density, all metal case, sturdy build. Battery life is not the best. Beautifully compact.

In practice, my experience is that it's mostly a lose-lose proposition. You have to invest in learning a bunch of same-but-difterent framework apis to do what the language already does natively. And in return, the code is more complex, and harder to debug, and so it has more bugs.

We once hired a very smart fellow to build out a media processing pipeline. He did with rxjs, but it wasn't scaling well. We tried to get with the paradigm for a bit and help scale it up, but flame graphs in profiler output were all crazy, and it was a pain to wire in timing traces, etc. We built a POC imperative version just to prove that we could indeed achieve the throughput we thought we could, and then we just said, well hey, this is faster and simpler, so... let's just go with this instead. And so we did.


Yeah, forgot to mention that in my comment.

Tools are all built around non-rx workflows. That means doing things like performance work is a lot harder. Especially because RX is working on message passing it can be really tricky to figure out when a pipeline is ultimately stalling.


Well, my team does what we call vibe engineering.

You do ask hard questions up front, define boundaries, give lots of high level architectural guidance, declare interfaces, and bounds of abstraction... And then you ask the LLM to make it so, and it does. You give it the structure, and it fills in the implementation.

This is engineering, more or less.


The rest of us who do similar work with the model to build richly define spec files. Spec driven development is a lost art with non-AI coding anyway. People who complain about these things either do not work with people using AI correctly and dont realize they just need to push for better standards or they are terrible at architecting. I notice everyone who hates AI just finds any excuse to dismiss it instead of actually pushing towards more effective ways of using it.

I would love to see how perfect their “organic” code looks because I wont be surprised if its full of all sorts of issues, all in prod, for years, never known or spotted, just to be found and fixed by Claude in 15 minutes, with unit testing to test and ensure no regression is introduced.


> using AI correct

There is no "correct" way to toss a coin. Some day people who are depending on LLMs blindly will understand that. All their notions of "correct use" is based on folk lore and...vibes, that is just maximizing token use under some misguided notion of "correctness"..


LLMs are not deterministic. They do typically behave within pretty reasonable boundaries. Humans are not deterministic. They also typically behave within pretty reasonable boundaries. Engineering with LLMs and humans means understanding those boundaries and designing for them. This is a legitimate engineering problem like any other. I think the main misalignment I see is the expected productivity gain. When you are using real engineering discipline it is still very productive to use AI for coding, but not nearly so productive as many people claim when you factor the fragility of their system.

There is no correct use. There is no “correct” way to build systems. There is principled and disciplined use.


>Humans are not deterministic.

There is a pretty big difference. If you ask a human "Is X true", and they says "yes", you can be 100% sure that they will always behave in a way that is logically consistent with X being true (talking about a competent and honest human being here, and when the implication is obvious). But by their very nature, there is no reason to assume the same with the LLMs.


Tbh you can treat Claude like a Junior developer and give it detailed feedback.

genuine question. if i have a tightly-defined unit test and Claude writes a blob of code that passes it, does it matter what's in the blob?

It matters for at least a few reasons:

- Depending on the nature of your application, it may be very important to be able to audit the business logic and intended behavior. For compliance reasons, for operational reasons, for moral/ethical reasons -- you very well might want to affirm what the code is actually trying to do.

- A coding agent may get very creative in order to write code that passes a tightly-defined unit test. It may come up with approaches that technically pass, but work against the overall intention of the app in the first place. This becomes an arms race rather than a productive collaboration, where the agent's increasing creativity has to be matched by a sprawling test suite.

- Eventually, inevitably, business requirements will change, and the blob will need to evolve. It will be much easier for an agent or a human alike to understand how to safely make the change, if the existing implementation is transparent and understandable.


Two possibilities:

1. Your unit tests are exacting enough to fully specify the unit. In that case, congratulations, your unit tests are the code. They're also probably much more awkward to write, maintain, etc. Also, the compilation step to go from the unit tests to the actual code is now orders of magnitude more expensive, requires a SaaS to even work, etc.

2. Your unit tests are not that exacting and still leave ambiguity, edge cases, etc. In that case it very much matters what's in the blob of code, because while it could be a correct implementation of what you wanted, it could also be something else entirely that just happens to be correct for the part you did specify.


If the test passes, you review the blob, and QA tests it. I dont see why its any different to you having copied code from StackOverflow.

It does not matter for one instance. But it does matter if you plan to make a living off it.

Who ensures it followed the specs?

The more context an LLM gets, the more likely it will start to ignore instructions.

If the LLM runs a context compression, all bets are off. There's a reason Anthropic upped the context to 1M tokens to reduce the chance of this from happening.


> Who ensures it followed the specs?

The human. But only if you care about verification.


The human is missing form OP's description. "and it fills in the implementation". No human in sight.

You can't call it "engineering" if you don't care about verification.


If you build a bridge, the engineers aren't the one doing the welding and crane operation and bolts and digging holes and whatnot.

They're the ones checking that work matches the plan.


Come on, now. The human writes the plan up front, which includes guidance on testing strategy, classes of tests, particular test cases to cover, etc. And just like normal, of course you don't just ship the code without doing manual verification, code review, auditing the test cases, and all the rest.

> Who ensures it followed the specs?

I mean, it's the same with building a bridge in the real world, right?

Someone has to check the work.


How do you do this? I really struggle to get the agents to follow my architectural invariants and coding conventions.

I use Cursor and Codex but the agents keep making regressions and breaking rules. They'll even take shortcuts sometimes, by doing things to make tests pass but with code that would be dangerous in prod.

Now, I use them file by file but it feels more like a typing assistant than something much more.


As of today they can't. You have to tell them what the new API looks like, which new classes they have to create and describe them in detail, etc... You have new projects that try to add good practices in the prompt [0] or audit your code once in a while [1] but it's not enough.

Right now they can be autonomous to finding bugs and inconsistencies. But not architecture or even just creating a long enough PR without any guidance and feedback.

[0] https://github.com/ChristopherKahler/carl

[1] https://github.com/ChristopherKahler/aegis


When your AI slides, make a permanent test that catches that particular slide. Then have it run all the tests every time it does something significant.

We have as much test code as deployable code because the AI keeps finding ways to do what we told it to, but not what we meant.


This is engineering, more or less.

People who build bridges for a living shake their heads in dismay.


At least bridges come in the realm of unchanging physics and unchanging material behavior. There is only so much variety in building bridges.

Software on the other hand...

You see, the difference is that with building bridges, there is no value in building a "Toy" bridge that does not require any real knowledge. But even toy software can bring huge value. But that does not mean it does not require engineering discipline to build non-toy software.

Software engineering is not about learning libraries or tools. It is the art and science of managing complexity under constant change.


They already were well before LLMs.

    myFramework new myCoolStartup
    myFramework generate dataModel
    myFramework generate controllerForModel
    myPackageManager install coolViewWidgets
    # insert glue code I learned on Youtube here
    git push coolPaaS myBranch

If LLMs doesn't fix things, why are we spending trillions of dollars boiling the ocean?

The bridge architects and engineers are not the ones hammering in the nails.

Clearly you have never worked in heavy industry, or you would know that the word "build" is used at all levels, all the way up to architect and real estate development level. Example:

https://www.buildordie.com/ (Build or Die is the web site for a mid-sized architecture firm.)


Before AI we have seen drastic drops in software quality across the board, even Windows has been going downhill for several major versions now.

What's your point?

You seem to be implying that since the current wave of AI started that things have gotten better. That is demonstrably, repeatedly, and completely false. Just cruise the HN front page and watch the AI fails scroll by.

That you point to Windows getting bad over the years, and the fact that it continues to get worse with the full AI buy-in of Microsoft, shows that AI is not some magical software savior.


When something is wrong everyone complains, when nothing is wrong you rarely hear a peep. I've either shipped to production or helped others ship "AI Slop" code as would be blindly described by others, despite me reviewing it and testing it. I've first hand seen AI-first greenfield projects go into production and help small businesses achieve more sales and success, heck I reviewed such code for a relative who is now hiring developers and lets them AI code so long as they review, because it gave him something no software company in his market would offer.

Slate is just some renderings though, right? Is there anything actually real about it more than just marketing?

Happen to be on their email list. They are taking orders soon and announcing pricing on 6/24. Initial delivery expected toward the end of the year.

This could very well be a pattern that some teams evolve into. Specs are the new source -- they describe the architectural approach, as well as the business rules and user experience details. End to end tests are described here too. This all is what goes through PRs and review process, and the code becomes a build artifact.

It just doesn’t work though. Anthropic couldn’t even get Claude to build a working C compiler which has a way better specification than any team can write and multiple reference implementations.

Many, many people have tried...

Yeah. I will probably join their ranks at some point.

Bash maintainer actually implemented the library feature I suggested and it's already dramatically cut down the amount of unsightly bash code I need to keep around and maintain.

I'm getting pretty tired of coping with old stuff just because it's there though. Went through this phase with GNU make too.


I struggle with this too. On the plus side, the devil you know is often better than the devil you don't know, and anything new will require re-learning a lifetime's worth of muscle memory. It's also nice to know that your bash scripts are going to be hyper-portable and will still work even many years later. The muscle memory is also real. However it isn't great to be constrained with unsightly code for sake of extreme backwards compatibility. I've found a nice balance personally where I use ruby if I need anything that bash isn't good at, but it's never a perfectly clean split.

> It's also nice to know that your bash scripts are going to be hyper-portable

Doubt. I'm up to my neck in bashisms, and I require the very latest bash on top of that.

  import() {
    local f
    for f in "$@"; do
      [[ -v loaded[$f] ]] && continue
      loaded[$f]=1
      source -p "${HOME}/.local/lib/bash" "${f}"
    done
  }

  import arguments terminal
The -p flag for source landed in bash 5.3.

Well yes, if you're using newer features, it's not going to be available on older systems that lack a newer bash version with those features available. I think that's pretty reasonable, otherwise we'd have to freeze the language and never add anything. But your older scripts will be very portable between future systems, and across different distros once they update. If you need to target an older system, you can't use newer features, but that's true of everything so I wouldn't expect any different from bash.

Sorry, which library feature?

I proposed adding an import builtin to bash on the mailing list. Sent patches too. Didn't go very well, to say the least.

Nevertheless, a version of the feature landed in bash 5.3.

  source -p "${HOME}/.local/lib/bash" file
You can use that to implement the library import function yourself.

  import() {
    local f
    for f in "$@"; do
      [[ -v loaded[$f] ]] && continue
      loaded[$f]=1
      source -p "${HOME}/.local/lib/bash" "${f}"
    done
  }

  import arguments terminal

... and many also have succeeded. fish would not be as popular as it is otherwise, other alternative shells that break bash compatibility are being worked on and are gaining traction, elvish, nushell, murex...

mixing shells is not as hard as some people claim. it's like switching programming languages. i do that all the time. but then, i avoid bash scripting as much as i can (or shell scripting in general). if you actually enjoy bash scripting then switching may be harder.


I might be in a minority, but I actually prefer fish as an interactive shell and bash (or plain /bin/sh) for scripting, if anything because that's what I'm used to :), and it's portable

I did the same thing but I'm now pushing it a bit further: POSIX shell rather than Bash for scripts. If what I'm doing can't be done with that it suggests that I should probably just write it in Python or Perl instead.

Fish scripting is limited to functions/aliases and this works out well since they're easy to read and tweak over time.


that's a sensible approach. fish does have the best interactive interface out there. i switched to elvish because i like it better for complex commandlines, mainly because it has support for more advanced data structures and also integrates json well. (and i realized that using braces for code blocks is nicer in a complex one line command, but both are better than bash for interactive use)

Really like using fish as my interactive shell too.

Many have succeeded writing functional alternative shells for sure, but none have replaced bash at any meaningful scale.

disagree. the fact that i can see more and more support for fish and also start seeing support for elvish shows that those alternative shells have reached a scale meaningful enough that tool developers actually consider it worth their effort to support them. what else is that if not evidence that alternative shells have reached a meaningful scale?

before fish basically noone dared to break bash compatibility. zsh is bash on steroids and other incompatible shells like csh, tcsh, ksh, etc were dead ends in that they kept a niche status.

fish was the first shell to break out of that and actually get noticed and gain a following. i believe that all other alternative shells after fish were encouraged only because of fish's popularity.


Dial-up BBS checks all these boxes. Now have at it!


You don't have a constitutional right to post on Facebook. When you invest your life into platforms run by for profit corporations, you agree to play by their rules. Merging state and big tech is not going to help.


You have many constitutional protections that do apply in business relationships. Extending that list is at minimum worth considering.


> You don't have a constitutional right to post on Facebook.

Well, that depends on who says you don't. If the government says so, they are wrong, because you do have a constitutional right enforceable against the government to post on Facebook.

The idea of saying "you don't have a constitutional right to post on Facebook" is that you don't have such a right enforceable against Facebook.

Which is true. But under current US law, you do have a civil right enforceable against any public accommodation to be offered the same service that they offer to the public generally.


> You don't have a constitutional right to post on Facebook

Which is why OP describes the U.S. enacting legislation creating a statutory right.


You are correct. But it's a ridiculous suggestion. Can you imagine the local corner store with a bulletin board, and some patron tacks up a picture of a swastika, and the owner of the store is not allowed to take it down?


Au contraire, enacting such a law is akin to forcing FB to support certain speech. That itself is unconstitutional and any such legislation would be struck down.


Good stuff, except don't get too excited about `datalist`. It just doesn't have enough hooks to be actually useful for anything other than a little prototype.


I've used a datalist for autocomplete suggestions and it's worked great.


I've had problems with <datalist> not showing when the input is misspelled, or when none of the <options> strictly begin with the input. I gave up and used an <ol> instead.


I think I’ve tried building a combobox using datalist once but it didn’t work


As you learn more about “raw” html you find all sorts of very fun things that are - ah - not very well implemented if at all.


The neat thing about HTML is that it's a living standard and anyone can contribute. Old bugs get corrected all the time simply because it annoyed a certain person enough for them to push a fix through the standards process.

Unfortunately, it could be around a decade before all three major browsers finally implement the standard, and the fix might not be quite as clean as you originally imagined.


The reason is that there are lots of webpage authors, lots of pages that use old standards and very few browser implementations. That made the browsers carry the burden of making it all work right for everyone.


Case in point: the keygen element.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: