I agree on both fronts! BuildKit frontends are not very well known but can be very powerful if you know how they work and how BuildKit transforms them.
After building Depot [0] for the past three years, I can say I have a ton of scar tissue from running BuildKit to power our remote container builders for thousands of organizations.
It looks and sounds incredibly powerful on paper. But the reality is drastically different. It's a big glob of homegrown thoughts and ideas. Some of them are really slick, like build deduplication. Others are clever and hard to reason about, or in the worst case, terrifying to touch.
We had to fork BuildKit very early in our Depot journey. We've fixed a ton of things in it that we hit for our use case. Some of them we tried to upstream early on, but only for it to die on the vine for one reason or another.
Today, our container builders are our own version of BuildKit, so we maintain 100% compatibility with the ecosystem. But our implementation is greatly simplified. I hope someday we can open-source that implementation to give back and show what is possible with these ideas applied at scale.
> It's a big glob of homegrown thoughts and ideas. Some of them are really slick, like build deduplication. Others are clever and hard to reason about, or in the worst case, terrifying to touch.
This is true of packaging and build systems in general. They are often the passion projects of one or a handful of people in an organization - by the time they have active outside development, those idiosyncratic concepts are already ossified.
It's really rare to see these sorts of projects decomposed into building blocks even just having code organization that helps a newcomer understand. Despite all the code being out in public, all the important reasoning about why certain things are the way they are is trapped inside a few dev's heads.
As someone who has worked in the space for a while and been heavily exposed to nix, bazel, cmake, bake, and other systems, and also been in that "passion project" role, I think what I've found is that these kinds of systems are just plain hard to talk about. Even the common elements like DAGs cause most people's eyes to immediately glaze over.
Managers and executives are happy to hear that you made the builds faster or more reliable, so the infra people who care about this kind of thing don't waste time on design docs and instead focus on getting to a minimum prototype that demonstrates those improved metrics. Once you have that, then there's buy-in and the project is made official... but by then the bones have already been set in place, so design documentation ends up focused on the more visible stuff like user interface, storage formats, etc.
OTOH, bazel (as blaze) was a very intentionally designed second system at Google, and buildx/buildkit is similarly a rewrite of the container builder for Docker, so both of them should have been pretty free of accidental engineering in their early phases.
I don't think you can ever get away from accidental engineering in build systems because as soon as they find their niche something new comes along to disrupt it. Even with something homegrown out of shell scripts and directory trees the boss will eventually ask you to do something that doesn't fit well with your existing concepts.
A build system is meant to yield artifacts, run tools, parallelize things, calculate dependencies, download packages, and more. And these are all things that have some algorithmic similarity which is a kind of superficial similarity in that the failure modes and the exact systems involved are often dramatically different. I don't know that you can build something that is that all-encompassing without compromising somewhere.
Blaze and bazel may have been intentionally designed, but it was designed for Google's needs, and it shows (at least from my observations of bazel, I don't have any experience with blaze). It is better now than it was, but it obviously was designed for a system where most dependencies are vendored, and worked better for languages that google used like c++, java, and python.
Blaze instead of make, ant, maven. But now there's cmake and ninjabuild. gn wraps ninjabuild wraps cmake these days fwiu.
Blaze is/was integrated with Omega scheduler, which is not open.
Bazel is open source.
By the time Bazel was open sourced, Twitter had pantsbuild and Facebook had buck.
OpenWRT's Makefiles are sufficient to build OpenWRT and the kernel for it. (GNU Make is still sufficient to build the Linux kernel today, in 2026.)
Make compares files to determine whether to rebuild them if they already exist; by comparing file modification time (mtime) unless the task name is in the .PHONY: list at the top of the Makefile. But the task names may not contain slashes or spaces.
`docker build` and so also BuildKit archive the build chroot after each build step that modifies the filesystem (RUN, ADD, COPY) as a cacheable layer identified by a hash of its content.
The FROM instruction creates a build stage from scratch or from a different container layer.
Dockerfile added support for Multi-stage builds with multiple `FROM` instructions in 2017 (versions 17.05, 17.06CE).
`docker build` is now moby and there is also buildkit? `podman buildx` seems to work.
nerdctl supports a number of features that have not been merged back to docker or to podman.
> it obviously was designed for a system where most dependencies are vendored, and worked better for languages that google used like c++, java, and python.
Those were the primary languages at google at the time. And then also to build software? Make, shell scripts, python, that Makefile calls git which calls perl so perl has to be installed, etc.
>> There are default gcc and/or clang compiler flags in distros' default build tools; e.g. `make` specifies additional default compiler flags (that e.g. cmake, ninja, gn, or bazel/buck/pants may not also specify for you).
Which CPU microarchitectures and flags are supported?
AVX-512 is in x86-64-v3. By utilizing features like AVX-512, we would save money (by utilizing features in processors newer than Pentium 4 (x86-64-v1)).
How to add an `-march=x86-64-v3` argument to every build?
How to add build flags to everything for something like x86-64-v4?
Which distros support consistent build parametrization to make adding global compiler build flags for multiple compilers?
- Gentoo USE flags
- rebuild a distro and commit to building the core and updates and testing and rawhide with your own compiler flags and package signatures and host mirrored package repos
- Intel Clear Linux was cancelled.
- CachyOS (x86-64-v3, x86-64-v4, Zen4)
- conda-forge?
Gentoo:
- ChromiumOS was built on gentoo and ebuild IIRC
- emerge app-portage/cpuid2cpuflags, CPU_FLAGS_X86=, specify -march=native for C/[C++] and also target-cpu=native for Rust in /etc/portage/make.conf
The ansible-in-containers thing is very much an unsolved problem. Basically right now you have three choices:
- install ansible in-band and run it against localhost (sucks because your playbook is in a final image layer; you might not want Python at all in the container)
- copy a previous stage's root into a subdirectory and then run ansible on that as a chroot, afterward copy the result back to a scratch container's root.
All of these options fall down when you're doing anything long-running though, because they can't work incrementally. As soon as you call ansible (or any other tool), then from Docker's point of view it's now a single step. This is really unfortunate because a Dockerfile is basically just shell invocations, and ansible gives a more structured and declarative-ish way to do shell type things.
I have wondered if a system like Dagger might be able to do a better job with this, basically break up the playbook programmatically into single task sub-playbooks and call each one in its own Dagger task/layer. This would allow ansible to retain most of its benefits while not being as hamstrung by the semantics of the caller. And it would be particularly nice for the case where the container is ultimately being exported to a machine image because then if you've defined everything in ansible you have a built-in story for freshening that deployed system later as the playbook evolves.
With multi-stage Dockerfiles, you only copy the final, built application artifacts from the earlier stage(s). Then, building a package as one signed file to copy is justified and easier anyway.
There's always:
RUN dnf remove -y ansible && dnf clean all
I thought there was a native way to build container images with ansible that don't have ansible installed in the image though?
> The Build Process Explained: When you run ansible-builder build, it goes through these steps:
> Reads your `execution-environment.yml` definition,
Resolves collection dependencies (including transitive dependencies),
Generates a `Containerfile` in a `context/` directory,
Copies dependency files into the build context,
Runs the container build using Podman or Docker
It probably shouldn't (?) parallelize because that wouldn't be a deterministic build; installing A then B is not the same as installing B then A. (Is not the same thing as installing A in one container image layer, B in another container image layer, and then trying to merge the package databases.) A given package B could conditionally install or configure according to whether or not A is already installed, and so for example package install tasks are not commutative.
.
Bootc (osbuild) builds VM and native machine images from Containerfiles:
I introduced Depot at my org a few months ago and I've been very happy with it. Conceptually it's simple: a container builder that starts warm with all your previously built layers right there, same as it would be running local builds. But a lot goes into making it actually run smoothly, and the performance-focused breakdown that shows where steps depend on each other and how much time each is taking is great.
It's clear a ton of care has gone into the product, and I also appreciated you personally jumping onto some of my support tickets when I was just getting things off the ground.
Thank you for the very kind words and for your support. Depot is full of incredible people who love helping others. So while you might see me on a ticket from time to time, it’s really an entire team that is behind everything we do.
Thank you for the post! It’s well done and you captured a lot of the concepts in BuildKit in an easy to understand way. Not an easy thing to do at all.
What’s the security situation around OpenClaw today? It was just a week or two ago that there was a ton of concern around its security given how much access you give it.
I don’t think there’s any solution to what SimonW calls the lethal trifecta with it, so I’d say that’s still pretty impossible.
I saw on The Verve that they partnered with the company that repeatedly disclosed security vulnerabilities to try to make skills more secure though which is interesting: https://openclaw.ai/blog/virustotal-partnership
I’m guessing most of that malware was really obvious, people just weren’t looking, so it’s probably found a lot. But I also suspect it’s essentially impossible to actually reliably find malware in LLM skills by using an LLM.
Regarding prompt injection: it's possible to reduce the risk dramatically by:
1. Using opus4.6 or gpt5.2 (frontier models, better safety). These models are paranoid.
2. Restrict downstream tool usage and permissions for each agentic use case (programmatically, not as LLM instructions).
3. Avoid adding untrusted content in "user" or "system" channels - only use "tool". Adding tags like "Warning: Untrusted content" can help a bit, but remember command injection techniques ;-)
4. Harden the system according to state of the art security. 5. Test with red teaming mindset.
Anyone who thinks they can avoid LLM Prompt injection attacks should be asked to use their email and bank accounts with AI browsers like Comet.
A Reddit post with white invisible text can hijack your agent to do what an attacker wants. Even a decade or 2 back, SQL injection attacks used to require a lot of proficiency on the attacker and prevention strategies from a backend engineer. Compare that with the weak security of so called AI agents that can be hijacked with random white text on an email or pdf or reddit comment
There is no silver bullet, but my point is: it's possible to lower the risk. Try out by yourself with a frontier model and an otherwise 'secure' system: the "ignore previous instructions" and co. are not working any more. This is getting quite difficult to confuse a model (and I am the last person to say prompt injection is a solved problem, see my blog).
> Adding tags like "Warning: Untrusted content" can help
It cannot. This is the security equivalent of telling it to not make mistakes.
> Restrict downstream tool usage and permissions for each agentic use case
Reasonable, but you have to actually do this and not screw it up.
> Harden the system according to state of the art security
"Draw the rest of the owl"
You're better off treating the system as fundamentally unsecurable, because it is. The only real solution is to never give it untrusted data or access to anything you care about. Which yes, makes it pretty useless.
Wrapping documents in <untrusted></untrusted> helps a small amount if you're filtering tags in the content. The main reason for this is that it primes attention. You can redact prompt injection hot words as well, for cases where there's a high P(injection) and wrap the detected injection in <potential-prompt-injection> tags. None of this is a slam dunk but with a high quality model and some basic document cleaning I don't think the sky is falling.
I have OPA and set policies on each tool I provide at the gateway level. It makes this stuff way easier.
The issue with filtering tags: LLM still react to tags with typos or otherwise small changes. It makes sanitization an impossible problem (!= standard programs).
Agree with policies, good idea.
I filter all tags and convert documents to markdown as a rule by default to sidestep a lot of this. There are still a lot of ways to prompt inject so hotword based detection is mostly going to catch people who base their injections off stuff already on the internet rather than crafting it bespoke.
Agree for a general AI assistant, which has the same permissions and access as the assisted human => Disaster. I experimented with OpenClaw and it has a lot of issues. The best: prompt injection attacks are "out of scope" from the security policy == user's problem.
However, I found the latest models to have much better safety and instruction following capabilities. Combined with other security best practices, this lowers the risk.
> I found the latest models to have much better safety and instruction following capabilities. Combined with other security best practices, this lowers the risk.
It does not. Security theater like that only makes you feel safer and therefore complacent.
As the old saying goes, "Don't worry, men! They can't possibly hit us from this dist--"
If you wanna yolo, it's fine. Accept that it's insecure and unsecurable and yolo from there.
Honestly, 'malware' is just the beginning it's combining prompt injection with access to sensitive systems and write access to 'the internet' is the part that scares me about this.
I never want to be one wayward email away from an AI tool dumping my company's entire slack history into a public github issue.
It's still bad, even if they fixed some low hanging fruits. Main issue: prompt injection when using the LLM "user" channel with untrusted content (even with countermeasures and frontier model) combined with insecure config / plugins / skills... I experimented with it: https://veganmosfet.github.io/2026/02/02/openclaw_mail_rce.h...
My company has the github page for it blocked. They block lots of AI-related things but that's the only one I've seen where they straight up blocked viewing the source code for it at work.
As a founder of Depot [0], where we offer our own faster and cheaper GitHub Actions runners, I can assure everyone that this feeling is the majority and not the minority.
Sounds strange to say as someone who has a product that is built around making GitHub Actions exponentially faster to close these feedback loops faster.
But I can honestly say it's only really possible because the overall system with GitHub Actions is so poor. We discover new bottlenecks in the runner and control plane weekly. Things that you'd think would be simple are either non-existent, don't work, or have terrible performance.
I'm convinced there are better ways of doing things, and we are actively building ideas in that realm. So if anybody wants to learn more or just have a therapy session about what could be better with GitHub Actions, my email is in my bio.
Founder of Depot[0] here. I'm disappointed by this change and by the impact this is going to have on all self-hosted runner customers, not just us. In my view, this is GitHub extracting more revenue from the ecosystem for a service that is slow, unreliable, and that GitHub has openly not invested in.
We will continue to do our best to provide the fastest GHA runners and keep them cheaper than GitHub-hosted runners.
Founder of Depot[0] here. To answer your idea, at Depot we already have this concept internally. In fact, Depot isn't reliant on webhooks at all to run your jobs. One of the reasons we can be up running your jobs when GitHub webhooks service is down. Effectively, we listen to a different system to know you have a job that needs to be run.
To your second statement, I generally agree. Sounds strange to say given we're in the business of GHA runners. But it's just not a performant or reliable system at scale. This change from GitHub doesn't smell of a company that wants to do right by it's users.
If you are interested in what is up next for us at Depot, feel free to ping me via the email in my bio. I think you'll be quite interested in what we are doing.
> This is the best thing I've seen all month. I'm actually blown away at just how accurate it is in making up the potential front page posts.
Hold on. The future hasn't happened yet.
I think what you mean is that you are blown away at just how plausible of a prediction it is. Probably meaning that something about it meshes with ideas already kicking around your head.
Founder of Depot here. We provide faster and more reliable GitHub Actions runners (as well as other build performance services) at half the cost of GitHub [0]
Is there a write up on the security of actions or equivalent that explains how they are secure both with direct and transitive dependencies? If this applies to Depot.
reply