The old mental model doesn't fit how any OS manages RAM. Every OS plays all sorts of fun guessing games about caching, predicting what resources your program will actually need etc. The OS does a lot of work to ensure that everything just hums along as best as possible.
When I was at AWS over a decade ago, there was endless complaints about the elevator algorithms by engineers, with the usual egotistical tech-bro insistence that they could do better. Things used to really suck at lunchtime when folks would flood to the elevators and be stuck waiting for ages. Those same geniuses could never figure out the benefit of staggering lunch times.
Someone got really tired of it, and somehow organised a hackathon weekend, with the elevator company, and let teams of engineers have at it.
Every single team failed to come up with better algorithms. All the complaints stopped dead.
I don't think the disdain was for seeking improvements. It was for the techbros thinking they can solve any problem, even in domains they have never worked in, better than anyone actually working in said domain.
The disdain on my part was very much towards the egotistical tech bros who were convinced they could do better in fields they had no background in.
It used to get really tiring seeing the rhetoric about every single field of expertise. So many tech bros that were simultaneously experts in law, geopolitics, elevators, building codes, architecture, sports, transportation, finance, global logistics, and beyond. Literally from one second to the next. Any time anything wasn't 100% perfect, it was because the people working in the field were idiots and they could have done it so much better.
I've been testing the same with some rust, and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself. It seems a little more likely to jam up than some other models I've experimented with.
It's also driving itself crazy with deadpool & deadpool-r2d2 that it chose during planning phase.
That said, it does seem to be doing a very good job in general, the code it has created is mostly sane other than this fuss over the database layer, which I suspect I'll have to intervene on. It's certainly doing a better job than other models I'm able to self-host so far.
> it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself.
I think this is part of the model’s success. It’s cheap enough that we’re all willing to let it run for extremely long times. It takes advantage of that by being tenacious. In my experience it will just keep trying things relentlessly until eventually something works.
The downside is that it’s more likely to arrive at a solution that solves the problem I asked but does it in a terribly hacky way. It reminds me of some of the junior devs I’ve worked with who trial and error their way into tests passing.
I frequently have to reset it and start it over with extra guidance. It’s not going to be touching any of my serious projects for these reasons but it’s fun to play with on the side.
Some of the early quants had issues with tool calling and looping. So you might want to check that you're running the latest version / recommended settings.
> and it's has spent a fair bit of time going through an infinite seeming loop before finally unjamming itself
I can live with this on my own hardware. Where Opus4.6 has developed this tendency to where it will happily chew through the entire 5-hour allowance on the first instruction going in endless circles. I’ve stopped using it for anything except the extreme planning now.
I don't know much about how these models are trained, but is this behavior intentional (ie, the people pulling the levers knew that this is how it would end up), or is it emergent (ie, pulling the levers to see what happens)?
I haven't seen a page on HF that'll show me "what models will fit", it's always model by model. The shared tool gives a list of a whole bunch of models, their respective scores, and an estimated tok/s, so you can compare and contrast.
I wish it didn't require to run on the machine though. Just let me define my spec on a web page and spit out the results.
I took a quick look, the dependency on php 8.5 is mildly irritating, even Ubuntu 26.04 isn't lined up to ship with that version, it's on 8.4.11.
You mention in the README that the goal is to run things in a standard environment, but then you're using a near bleeding edge PHP version that people are unlikely to be using?
I thought I'd just quickly spin up a container and take a look out of interest, but now it looks like I'll have to go dig into building my own PHP packages, or compiling my own version from scratch to even begin to look at things?
Hell, the number of times I've used a lot of the data structures that come up in leetcode exercises without at least looking at some reference material is pretty small. I usually assume I'm going to misremember it, and go double check before I write it so I don't waste ages debugging later.
> reaching to the point of saving php code into the mysql database and executing from there.
Wordpress loves to shove php objects into the database (been a good long while since I used it, don't remember the mechanism, it'd be the equivalent of `pickle` in python, only readable by php).
Not sure if they've improved it since I last dealt with it about 15 years ago, but at the time there was no way to have a full separated staging and production environment, lots of the data stored in the database that way had hardcoded domain names built into it. We needed to have a staging and production kind of set-up, so we ended up having to write a PHP script that would dump the staging database, fix every reference, and push it to production. Fun times.
> The network latency bit deserves more attention. I’ve been trying to find out where AI companies are physically serving LLMs from but it’s difficult to find information about this. If I’m sitting in London and use Claude, where are the requests actually being served?
Unfortunately, as with most of the AI providers, it's wherever they've been able to find available power and capacity. They've contracts with all of the large cloud vendors and lack of capacity is significant enough of an issue that locality isn't really part of the equation.
The only things they're particular about locality for is the infrastructure they use for training runs, where they need lots of interconnected capacity with low latency links.
Inference is wherever, whenever. You could be having your requests processed halfway around the world, or right next door, from one minute to the next.
I've been fighting with an AI code review tool about similar issues.
That and it can't understand that a tool that runs as the user on their laptop really doesn't need to sanitise the inputs when it's generating a command. If the user wanted to execute the command they could without having to obfuscate it sufficient to get through the tool. Nope, gotta waste everyone's time running sanitisation methods. Or just ignore the stupid code review tool.
There is a plausible scenario in which a user finds some malicious example of cli params for running your command and pasts it in the terminal. You don't have to handle this scenario, but it would be nice to.
There is a plausible scenario where a user cuts their wrist open cooking dinner. You don't have to file the edge off cooking knives, but won't you think of the children?
Kitchen knives actually do have safety features, such as non-slip handles and finger guards, which users appreciate. I certainly do. Users also appreciate safeguards in cli tools, such as not deleting all data if input happens to be slightly wrong. Sure, you could design your tool to be used exclusively by leet hackers, but the idea of sanitizing your inputs is not completely preposterous.
I've seen stuff posted about chained assignment footguns in python regularly over the years, and it always surprises me. I don't think I've ever written them, or reviewed code that does. I don't think it'd occur to me to even think about writing a chained assignment.
Is chained assignment a pattern that comes from another language that people are applying to python?
reply