The bottleneck here is usually the locally hosted model, not the the assistant harness. You can take any off the shelf assistant and point the model URL at localhost, but if your local model doesn't have enough post training and fine tuning on agentic data, then it will not work. The AI Assistant/OpenClaw is just calling APIs in a for loop hooked up to a cron job.
Exactly. OpenClaw is good, but expects the model to behave in a certain way, and I've found that the local options aren't smart enough to keep up.
That being said, my gut says that it should be possible to go quite far with a harness that assumes the model might not be quite good (and hence double-checks, retries, etc)
If you mean something that calls a model that you yourself host, then it's just a matter of making the call to the model which can be done in a million different ways.
If instead you mean running that model on the same device as claw, well... that ain't happening on an ESP32...
I think if you are capable of setting up and running a locally hosted model then I'd guess the first option needs no explanation. But if you're in the second case I'd warn you that your eyes are bigger than your mouth and you're going to get yourself into trouble.
It really depends on what resources you have qwen-code-next will run them but you will need at least 64gb of memory to run it at a reasonable quant and context.
Most of these agents support OpenAI/anthropic compatible endpoints.
All the assistants work with locally hosted models. Home Assistant LLM works with small tuned models to do specific things, and the *Claw stuff works with larger models
Is Cinder something that could help optimize real-time streaming? We had a UDP stream and then through multiple gstreamer and nvidia deepstream magic (which I believe the senior dev implemented in Python) we perform some ML inference on the stream in real-time.
However, latency is a major issue here and to get to our MVP we didn't really prioritize optimization, as is tradition.
So now I'm wondering if Cinder as something that can be used to optimize real-time data streaming is a thing or whether me asking this just shows I don't understand its use case.
Either way, thank you advance for your insight.
(Also we used Django which I am now wondering if I should have switched out for FastAPI, but that's a separate question)
Cinder's feature set is highly optimized for IO bound web services that run under a forked-worker model.
For example: you start a main process, warm it up with a few requests, run the JIT compiler and then fork off worker processes to handle the main chunk of traffic.
As of now, it requires hand-tuning to get the best possible performance.
In terms of use cases, Cinder does the best when faced with "business logic" code (lots of inheritance, attribute lookups, method calls, etc). It can speed up numerical computations too, but you're probably better off using a library if that's the majority of the workload.
I've moved to Zoho mail. It's $1/month for a single user, which seems to be well worth it. The catch is, it's not possible to _send_ from Zoho without creating additional users (which has a cost).
Unlike RPython, Static Python in cinder is not really a subset of Python, it can compile everything (although it will throw compile time errors if it sees mismatched types). If it cannot determine type information, it just assumes the type could be anything, and falls back to slower CPython behavior.
> So much stuff just from the readme would introduce breaking changes to the Python ecosystem.
Being compatible with the rest of the Python ecosystem is the main reason why Cinder is built on top of CPython. Although yes, some features are indeed very experimental.
> in a world where we have type annotations, JITs feel like a massive step back. Stuff like mypyc could get us way further into high performance stuff
Ah, but that introduces a separate compilation step, which may not be tolerable in every situation.
Why would developers have to interact with a mypyc step any more than the pyc step? Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?
FWIW, I think we could probably buy ourselves a lot of latitude to optimize CPython by designating a much smaller API surface (like h.py) and then optimizations largely won’t have to worry about breaking compatibility with C-extensions (which seems to be the biggest reason CPython is unoptimized).
But in general I’ve lost faith in the maintainers’ leadership to drive through this kind of change (or similarly, to fix package management), so I’ve moved on to greener pastures (Go for the most part, with some Rust here and there) and everything is just so easy nowadays compared to my ~15 years as a Python developer.
> Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?
For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.
For smaller projects, Cython works extremely well (and we do use it for places where we need to interface with C/C++).
> For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.
So we weren’t talking about Cython specifically, but something Cython-like, i.e., we’re not talking about Cython’s special syntax but rather ordinary Python. This is important because it means that dev builds execute against CPython directly (i.e., your code begins executing immediately) while production builds use our hypothetical AOT compiler.
Yes, Static Python especially relies heavily on strict modules, since they enable us to perform module-local analysis, which enables some cool optimizations.
reply