What you're saying is true, but the OP has a point too.
What's basically happening is that as things get faster the lifetime of training data decreases because the system becomes more sensitive to environmental conditions, so training procedures which were previously performed earlier in the manufacturing cycle are now delegated to the runtime, so the system migrates from data to code.
Previously, you or the vendor would provide tools and a calibration system which would infer some values and burn a calibration, and then load it during early boot. More recently, the runtime is usually a combination of a microcontroller and fixed-function blocks on the DDR PHY, and that microcontroller's firmware is usually supplied as a generic blob by the vendor. The role of this part of the system keeps growing. The system has gotten a bit more closed; it's increasingly moved from "use this magic tool to generate these magic values, or read the datasheets and make your own magic tool" to "load this thing and don't ask questions."
You played in hard mode in a weird sense; more modern DDR versions are in a backwards sense "easier" if you're buying the IP, because a lot of the training has moved to boot time and is handled by the vendor IP rather than needing to be run during burn-in using some proprietary toolkit or self-test tool.
It's just as arcane and weird, but if you buy one of the popular modern packages for DDR4/5 like DesignWare, more and more training is accomplished using opaque blob firmware (often ARC) loaded into an embedded calibration processor in the DDR controller itself at boot time rather than constants trained by your tooling or the vendor's.
I don't know if this is still the case, but back then the likes of Synopsys charged a lot of money for what was very limited controller functionality; you were stuck with their frustrating support channels and generally dumpster fire firmware. Our controller was fully custom to our needs, supporting more optimum refresh schemes tightly integrated with our application, and multiple memory protocols (not just DDR3), and I don't remember what else.
At least we were able to modify the training algorithms and find the improvements, rather than being stuck with the usual vendor "works for us" response. Especially with something like commodity DDR, where our quantities don't command much clout. But it was a bit of an ordeal and may have contributed to us buying in a controller for our next gen (not DDRx). But I think we're going the other way again after that experience..!
IMO this really isn’t a huge problem for this project specifically, since that part is outsourced to tree-sitter which has a lot of effort behind it to begin with.
I think this project is incredibly cool as a line of research / thought but my general experience in trying to provide human interfaces using abstractions over source code suggests that most people in general and programmers especially are better at reasoning in the source code space. Of course, beagle can generate into the source code space at each user interaction point, but at that point, why not do the opposite thing, which is what we already do with language servers and AST driven (semantic) merge and diff tools?
It's also just one more facet. The problem already exists for anything else that we already have, like formatters, linters, syntax highlighters, language servers... And it's also not an exclusive choice. If you want to use a dumb editor, there's nothing preventing that. All of the machinery to go back and forth to text exists. Not really a huge departure.
I find Mergiraf pretty pleasant to use and frequently pretty helpful as a time-saver. Handles TOML and Rust for me, and I have way fewer manual interventions, especially after supplementing it with rustfmt rules to not do a bunch of merged use statements in one go. Easy to configure as a jujutsu tool as well.
100% agree. I think AST-driven tooling is very valuable (most big companies have internal tools akin to each operation Beagle provides, and Linux have Coccinelle / Spatch for example), but it's still just easier implemented as a layer on top of source code than the fundamental source of truth.
There are some clever things that can be done with merge/split using CRDTs as the stored transformation, but they're hard to reason about compared to just semantic merge tools, and don't outweigh the cognitive overhead IMO.
Having worked for many years with programming systems which were natively expressed as trees - often just operation trees and object graphs, discarding the notion of syntax completely, this layer is incredibly difficult for humans to reason about, especially when it comes to diffs, and usually at the end you end up having to build a system which can produce and act upon text-based diffs anyway.
I think there's some notion of these kinds of revision management tools being useful for an LLM, but again, at that point you might as well run them aside (just perform the source -> AST transformation at each commit) rather than use them as the core storage.
you can parse the text at any time pretty much for free and use anything you learn to be smarter about manipulating the text. you can literally replace the default diff program with one that parses the source files to do a better job today.
This is the fundamental idea behind git - to fully compute/derive diffs from snapshots (commits) and to only store snapshots. While brilliant in some ways - particularly the simplifications it allows in terms of implementation, I’ve always felt that dropping all information about how a new commit was derived from its parent(s) was wasteful. There have been a number of occasions where I wished that git recorded a rename/mv somehow - it’s particularly annoying when you squash some commits and suddenly it no longer recognizes that a file was renamed where previously it was able to determine this. Now your history is broken - “git blame” fails to provide useful information, etc. There are other ways of storing history and revisions which don’t have this issue - git isn’t the end of the line in terms of version control evolution.
I agree with this, I just don't think I agree with the Beagle approach (CRDT on AST as the source of truth) vs. the Git method (bytewise files as the source of truth) with something alongside.
Like, I think it's way easier to add a parallel construction to Git (via a formal method or even a magic file) which includes the CRDT for the AST than it is to make that the base unit of truth. It still lets you answer and interact at the higher level with "oh, this commit changed $SYM1 to $SYM2" without also destroying byte-level file information that someone finds important, and without changing the main abstraction from the human-space to the computer-space.
CRDT's trick is metadata. Good old diff guesses the changes by solving the longest-common-subsequence problem. There is always some degree of confusion as changes accumulate. CRDTs can know the exact changes, or at least guess less.
This is really interesting to me; I have the opposite belief.
My worry is that any idiot can prompt themselves to _bad_ software, and the differentiator is in having the right experience to prompt to _good_ software (which I believe is also possible!). As a very seasoned engineer, I don't feel personally rugpulled by LLM generated code in any way; I feel that it's a huge force multiplier for me.
Where my concern about LLM generated software comes in is much more existential: how do we train people who know the difference between bad software and good software in the future? What I've seen is a pattern where experienced engineers are excellent at steering AI to make themselves multiples more effective, and junior engineers are replacing their previous sloppy output with ten times their previous sloppy output.
For short-sighted management, this is all desirable since the sloppy output looks nice in the short term, and overall, many organizations strategically think they are pointed in the right direction doing this and are happy to downsize blaming "AI." And, for places where this never really mattered (like "make my small business landing page,") this is an complete upheaval, without a doubt.
My concern is basically: what will we do long term to get people from one end to another without the organic learning process that comes from having sloppy output curated and improved with a human touch by more senior engineers, and without an economic structure which allows "junior" engineers to subsidize themselves with low-end work while they learn? I worry greatly that in 5-10 years many organizations will end up with 10x larger balls of "legacy" garbage and 10x fewer knowledgeable people to fix it. For an experienced engineer I actually think this is a great career outlook and I can't understand the rug pull take at all; I think that today's strong and experienced engineer will be command a high amount of money and prestige in five years as the bottom drops out of software. From a "global outcomes" perspective this seems terrible, though, and I'm not quite sure what the solution is.
>For short-sighted management, this is all desirable since the sloppy output looks nice in the short term
It was a sobering moment for me when I sat down to look at the places I have worked for over my career of 20-odd years. The correlation between high quality code and economic performance was not just non-existing, it was almost negative. As in: whenever I have worked at a place where engineering felt like a true priority, tech debt was well managed, principles followed, that place was not making any money.
I am not saying that this is a general rule, of course there are many places that perform well and have solid engineering. But what I am saying is that this short-sighted management might not be acting as irrationally as we prefer to think.
I generally agree; for most organizations the product is the value and as long as the product gives some semblance of functionality, improving along any technical axis is a cost. Organizations that spend too much on engineering principles usually aren’t as successful since the investment just isn’t worth it.
But, I have definitely seen failure due to persistent technical mistakes, as well, especially when combined with human factors. There’s a particularly deep spiral that comes from “our technical leadership made poor choices or left, we don’t know what to invest in strategically so we keep spending money on attempted refactors, reorgs, or rewrites that don’t add more value, and now nobody can fix or maintain the core product and customers are noticing;” I think that at least two companies I’ve worked at have had this spiral materially affect their stock price.
I think that generative coding can both help and hurt along this axis, but by and large I have not seen LLMs be promising at this kind of executive function (ie - “our aging codebase is getting hard to maintain, what do we need to do to ensure that it doesn’t erode our ability to compete”).
As always has been, but for most of two boom times throught he industry was forgotten, is that specification is everything.
If you adequately specify what you want, then LLM's today are perfectly capable to produce code of a quality exceeding most humans.
But what has been going on is that many of the details of architecture and code have been implied as "good practice" or "experience" because it is time consuming to write a good specification, partly because you need to first work out exactly what you want.
2. We'll come up with better techinques to make guardrails to help
Making up examples:
* right now, lots of people code with no tests. LLMs do better with tests. So, training LLMs to make new and better tests.
* right now, many things are left untested because it's work to build the infrastructure to test them. Now we have LLMs to help us build that infrustructure so we can use it make better tests for LLMs.
* better languages and formal verification. If an LLM codes in Rust, there’s a class of bugs that just can’t happen. I imagine we can develop languages with built-in guardrails that would’ve been too tedious for humans to use.
ChatGPT came out a little over 3 years ago. After 5-10 more years of similar progress I doubt any humans will be required to clean up the messes created by today’s agents.
I don't think the ancient nature of the exploit chain has much bearing on the origin. I think it points away from the actual 2025 campaigns being USG-attached, but I don't think anyone was suggesting that to start with - the Google report makes it pretty clear that they believe the same code was resold to several parties, either in parallel or sequentially, around this time frame.
I think the notion here is that either:
* There's a shared upstream origin or author between this toolkit and the Operation Triangulation toolkit ahead of the use in Operation Triangulation (ie - someone sold this chain to both the Operation Triangulation authors and a third party). I actually think that the uses of specifically structured code-names internally and the overall structure of the codebase described in the Google writeup make this theory less likely; building an exploit toolkit while using these practices to cosplay as a US-government affiliated engineer would be clever and fun, but it's not something we've really seen before.
* This toolkit originated from (whether it was leaked, compromised, or resold) the same actor who was responsible for Operation Triangulation.
Right, I agree with you; my thing is mostly just differentiating between CNE enablement packages the USG itself creates vs CNE enablement packages that are on offer to every USG-aligned country, of which there are a bunch.
This seems odd to me. I have never seen obfuscation techniques in first party Apple software - certainly not in Espresso or ANECompiler and overall nowhere at all except in media DRM components (FairPlay).
Apple are really the major OS company _without_ widespread use of a first party obfuscator; Microsoft have WarBird and Google have PairIP.
> Apple are really the major OS company _without_ widespread use of a first party obfuscator
You might want to look into techniques like control-flow flattening, mixed boolean–arithmetic transformations, opaque predicates, and dead code injection — Apple uses all of these. The absence of a publicly named obfuscator doesn’t mean Apple doesn’t apply these methods (at least during my time there).
Ever wonder why Apple stopped shipping system frameworks as individual .dylib files? Here’s a hint: early extraction tools couldn’t preserve selector information when pulling libraries from the shared cache, which made the resulting decompiled pseudocode unreadable.
I'm very familiar with CFG flattening and other obfuscation techniques, thanks.
That's interesting; I suppose I must not have touched the parts of the platform that use them, and I've touched a fair amount of the platform.
Again, I _have_ seen plenty of obfuscation techniques in DRM/FairPlay, but otherwise I have not, and again, I am entirely sure the ANE toolchain from CoreML down through Espresso and into AppleNeuralEngine.framework definitely does not employ anything I would call an obfuscation technique.
> Ever wonder why Apple stopped shipping system frameworks as individual .dylib files?
If the dyld cache was supposed to be an obfuscation tool, shipping the tools for it as open source was certainly... a choice. Also, the reason early tools couldn't preserve selector information was selector uniqueing, which was an obvious and dramatic performance improvement and explained fairly openly, for example - http://www.sealiesoftware.com/blog/archive/2009/09/01/objc_e... . If it was intended to be an obfuscation tool, again it was sort of a baffling one, and I just don't think this is true - everything about the dyld cache looks like a performance optimization and nothing about it looks like an obfuscator.
I’m still relatively new to HN, but I continue to find it fascinating when people share their perspectives on how things work internally. Before joining Apple, I was a senior engineer on the Visual Studio team at Microsoft, and it's amazing how often I bump into people who hold very strong yet incorrect assumptions about how systems are built and maintained.
> I suppose I must not have touched the parts of the platform that use them
It’s understandable not to have direct exposure to every component, given that a complete macOS build and its associated applications encompass tens of millions of lines of code. /s
That said, there’s an important distinction between making systems challenging for casual hackers to analyze and the much harder (if not impossible) goal of preventing skilled researchers from discovering how something works.
> Also, the reason early tools couldn't preserve selector information was selector uniqueing
That isn't even remotely how we were making things difficult back then.
I led the SGX team at Intel for a while, working on in-memory, homomorphic encryption. In that case, the encryption couldn’t be broken through software because the keys were physically fused into the CPU. Yet, a company in China ultimately managed to extract the keys by using lasers to remove layers of the CPU die until they could read the fuses directly.
I’ll wrap up by noting that Apple invests extraordinary effort into making the critical components exceptionally difficult to reverse-engineer. As with good obfuscation—much like good design or craftsmanship—the best work often goes unnoticed precisely because it’s done so well.
I'm done here - you go on believing whatever it is you believe...
I'm thoroughly enjoying this thread by the way, between someone who is clearly informed and educated in platform research, and pretty enthusiastic and interested in the field, and yourself - an deeply experienced engineer with truly novel contributions to the conversation that we don't often see.
Looking very forward to more of your insight/comments. Hopefully your NDA has expired on some topic that you can share in detail!
Thank you for your comment. I started this thread just as a simple "job well done" to the authors. I didn't expect to be told that my work doesn't exist. ;-)
No one ever notices plastic surgery when it is done well. The same can be true for obfuscation. But, as I indicated, no amount of obfuscation is foolproof when dealing with experienced, well-funded attackers. The best you can do is make their task annoying.
* They haven’t said the source isn’t available to them, just that the closed nature of the ANE means they can’t use it in OSS.
* They’ve repeated constantly that it can’t do backprop and isn’t useful for most MLX use cases.
And really, ANE isn’t even that interesting for MLX really; it’s a limited resource power efficient inference engine for smallish edge models. If you want to use it you can use the Apple APIs, which while limited are generally “shaped” like what you’d want to do anyway. Almost every “biggish” CPU has one of these now and Apple don’t want to give away the specifics of theirs (even though it’s been pretty thoroughly RE’d by real REs and re-summarized by Claude, like this article).
It also makes a lot of really useful features like on device OCR, captions, voice isolation, temporal antialiasing in metalfx, an enormous host of things in the apple pro apps, etc. work
Yeah, I don't use any of those features. So it sounds like its for folks who are creatives running lightroom or apple movie, or some kind of apple sound program?
I'm a dev, not a creative, unfortunately. I don't use other people's software, I generally write my own (or used to before Claude took over my world).
SkySafe (https://skysafe.io) | Wireless Engineer (SDR) | San Diego, CA | ONSITE (Hybrid)
At SkySafe we build drone detection and tracking at scale.
I am looking for a Wireless Communications Engineer with SDR expertise to join my team - I'm the hiring manager.
I need someone with a very strong background in signal theory and experience building software wireless modems. You will spend time learning about new systems from our reverse engineers (no RE experience required!), implementing both open and recovered specifications, and refactoring existing modems code to improve it in both theory (ie - what is the algorithmic state of the art for demodulating xyz thing) and practice (ie - how much memory bandwidth and CPU are you using where). As such, I'm looking for someone with true software modem experience; while DSP, FPGA, and generated code experience are useful and relevant, you'll need to be able to read and write C++ directly.
I need someone who can come into the office in San Diego as needed to work with hardware, so this is a hybrid on-site role. This position requires access to technology which is controlled by US Export laws, so you also need to be an eligible US Person.
What we have to offer: $145-200k. Fun, small team dynamic. Work with experts in adjacent fields. Startup environment with good work-life balance and limited red tape. Rapid iteration and feedback in the wild. Good equipment at the office.
Apply: brian@ for a direct line, https://jobs.lever.co/skysafe for the formal route (we have some other roles I'm not the HM for listed there, too, if you're interested in the company!)
What's basically happening is that as things get faster the lifetime of training data decreases because the system becomes more sensitive to environmental conditions, so training procedures which were previously performed earlier in the manufacturing cycle are now delegated to the runtime, so the system migrates from data to code.
Previously, you or the vendor would provide tools and a calibration system which would infer some values and burn a calibration, and then load it during early boot. More recently, the runtime is usually a combination of a microcontroller and fixed-function blocks on the DDR PHY, and that microcontroller's firmware is usually supplied as a generic blob by the vendor. The role of this part of the system keeps growing. The system has gotten a bit more closed; it's increasingly moved from "use this magic tool to generate these magic values, or read the datasheets and make your own magic tool" to "load this thing and don't ask questions."
reply