100% this! This is why I explicitly mentioned in my CLAUDE files to generously use Python for any calculations using the virtualenv. It was also my first time using Claude for tax related calculations so I did cross-check it against an online ACB calculator https://www.adjustedcostbase.ca
> I use structured JSON documents precisely for this reason, because with a well-defined schema the LLM can write TypeScript functions that must compile.
The intention of this is to reduce hallucination on information extraction, right?
Also, how do you convert your docs / information into JSON documents?
> The intention of this is to reduce hallucination on information extraction, right?
Correct.
> Also, how do you convert your docs / information into JSON documents?
Right now you have to add it yourself to the database. The idea is that you use Superego as the software in which you record your expenses / income / whatever, so the data is naturally there already.
But I'm also working on an "import from csv/json/etc" feature, where you drop in a file and the AI maps the info it contains to the collections you have in the database.
Oh, interesting, I didn't know about this project, thanks for sharing!
I tried to implement something like this for the search functionality, but ended up going with "old school" lexical search instead.
Mostly because, in my experimentation, vector search didn't perform significantly better, and in some case it performed worse. All the while being much more expensive on several fronts: indexing time (which also requires either an API or a ~big local model), storage, search time, and implementation complexity.
And Superego's agent actually does quite well with the lexical search tool. The model usually tries a few different queries in parallel, which approximates a bit semantic search.
As a counter argument to the kubectl example made in the article, I found the k8s MCP (https://github.com/containers/kubernetes-mcp-server) to be particularly usefuly in trying to restrict LLM access to certain tools such as exec and delete tools, something which is not doable out of box if you use the kubectl CLI (unless you use the --as or --as-group flags and don't tell the LLM what user/usergroup those are).
I have used the kk8s MCP directly inside Github Copilot Chat in VSCode and restricted the write tools in the Configure Tools prompt. With a pseudo protocol established via this MCP and the IDE integration, I find it much safer to prompt the LLM into debugging a live K8s cluster vs. without having any such primitives.
Orchestera (https://orchestera.com/) - Fully managed Apache Spark clusters in your own AWS account with no additonal compute markups, unlike EMR and Databricks.
Currently implemented the following:
- Automated scale in / scale out of nodes for Spark executors and drivers via Karpenter
- Jupyter notebook integration that works as a Spark driver for quick iteration and prototyping
- A simple JSON based IAM permissions managementent via AWS Parameter Store
Work-in-progress this month:
- Jupyterhub based Spark notebook provisioning
- Spark History Server
- Spark History Server MCP support with chat interface to support Spark pipeline debugging and diagnostics
Spent my 2025 building https://orchestera.com as a side project. The premise is simple - to give Data Engineers and Data Scientists the opportunity to spin up Apache Spark clusters on Amazon EKS without them needing to know all the infrastructure details, how Spark works on Kubernetes, auto-scaling etc.
The platform I am building allows users to launch Spark on Kubernetes in their own AWS account without adding any markup costs to the CPU/Memory on EC2 instances. For example, AWS EMR offering adds a 25% markup cost on top of the EC2 instance pricing. Databricks markup is even higher ranging anywhere from 30% to 100% markup.