If you read history widely (across millennia and geographies), you'll note that most of the power-contests follow this pattern[0]. In the modern industrial world, the pattern becomes exponential rather than incremental. What I'm saying is that this is not unique to AI Labs[1]. This is caused by the deeply flawed and unbalanced system that we have constructed for ourselves.
[0]: The pattern, or, as gamers would call it, the "meta", is that every ambitious person/entity wants to control as much of the economic/material surplus as possible. The most effective and efficient (effort per control) way of doing this is to make yourself into as much of a bottle-neck as humanly possible. In graph-theory this corresponds to betweenness-centrality, and you want to maximize that value. To put it in mundane terms, you want to be as much of a monopoly as you can be (Thiel is infamous for saying this, but it does check out, historically). To maximize betweenness, or to maximize monopoly, is to maximize how much society/economy depends on you. This is such a dominant strategy (game-theory term, but in modern gaming world, they might call this a "cheesy strat" -- which just means that the game lacks strategic variety, forcing players to hone that one strategy), that we even have some old laws (anti-trust, etc) designed to prevent it. And it makes a lot of sense: Standard Oil was reviled because everything in the economy either required oil or required something that did. 20th-century USA did a lot to mitigate this. It forced monopolies like ATT to fund general research like Bell Labs (still legendary) towards a public good (a kind of tax, but probably much more socially-beneficial). It also broke up the monopolies, and passed anti-profit laws (e.g. hospitals were not allowed to make a profit until 1978; I have seen in the last 10 years a tiny cancer clinic grow into a massive gleaming hospital -- a machine that transforms sickness and grief into Scrooge McDuck vaults of cash). This monopolistic tendency of the commercial sector, is a tendency towards centralization, which yields efficiency, sure, but also creates the conditions for control and rent-seeking and exploitation.
[1]: Much of the cloud-computing craze was similar in character (and also failed to deliver on some of its promises, such as reducing/replacing IT overhead (they just renamed IT to DevOps)). And Web2 itself was about creating and monopolizing a new kind of ad-channel and lead-generation-machine. There is a funny twist, that a capitalist society like the USA, has much more deeply rooted incentives to create a panopticon than communist states of the past ever did. Neither is pretty of course. The communists demanded conformity and loyalty, while the capitalists demand consumption and rent.
Not really a question. Just wanted to express my gratitude for Hypothesis. I use it regularly. A few years back, I had to build a semi-formally-verified fund and account management service, and used the state-based-testing of Hypothesis to validate its correctness. Cannot express how invaluable this little framework has been.
A little while after that, I spoke to someone in the pharma-adjacent-space who was looking at Antithesis to validate their product. At the time, Antithesis (the company) told him that it was a bad fit. I suggested something akin to my previous approach (which did not include antithesis). No clue what they ended up doing, but it is nice to see that Hypothesis and Antithesis have finally joined forces.
And sometimes you find errors in code that absolutely should never have errors: I found an (as of yet not-root-caused) error in sqlite (no crash or coredump, just returns the wrong data, and only when using sqlite in ram-only-mode). Had to move to postgres for that reason alone. This is part of the reason why I have a strong anti-library bias (and I sound like a lunatic to most colleagues because they "have never had a problem" with $favorite_library -- to which my response is: "how do you _know_?"[0], which often makes me sound like I'm being unreasonably difficult).
Sometimes, only thing you can do is let the plague spread, and hope that the people who survive start showering and washing their hands.
[0]: I once interviewed at a company that sold a kind of on-prem VM hosting and storage product. They were shipping a physical machine with Linux and a custom filesystem (so not ZFS), and they bragged about how their filesystem was very fast, much faster than ZFS or Btrfs on SSDs. I asked them, if they were allowed to tell me how they achieved such consistent numbers. They listed a few things, one of which was: "we disabled block-level check-summing". I asked: "how do you prevent corruption?". They said: "we only enable check-summing during the nightly tests". So, a little unsettled, I asked: "you do not do _any_ check-summing at any point in production"? They replied: "Exactly. It's not necessary". So, throwing caution to the wind (at this point I did not care to get the job), I asked: "And you've never had data corruption in production"? They said: "Never. None". To which I replied: "But how do you _know_"? My no-longer-future-coworker thought for a few seconds, and realization flashed across his face. This was a company that had actual customers on 2 continents, and was pulling in at least millions per year. They were probably silently corrupting customer data, while promising that they were the solution -- a hospital selling snake-oil, while thinking it really is medicine.
> I found an (as of yet not-root-caused) error in sqlite (no crash or coredump, just returns the wrong data, and only when using sqlite in ram-only-mode).
You should report this to the SQLite developers - they are very smart and very interested in fixing SQLite correctness bugs!
Didn't get around to reporting it (huge backlog of tasks). Luckily I am working on a project that _has_ to support SQLite, so if I run into the bug again, I'll report it.
I don't believe that I can tell you the name of the company (they made me sign some NDAs, before the interview, and I have no clue how enforceable those are). Also, this was in 2019, so I would be shocked if they did not fix the problem by now -- especially after I interviewed there (plus I can't be the only one to have noticed this, since).
That said, you have a few data-points if you want to try to triangulate them yourself: physical vm-hosting and storage product, existed since at least 2019, used linux kernel as hypervisor, custom FS, international customers across 2 continents. All of those data-points are my recollection from 2019.
This is not exactly novel. In the 2000s, someone made a fully functioning Perl 6 runtime in a very short amount of time (a month, IIRC) using Haskell. The various Lisps/Schemes have always given you the ability to implement specialized languages even more quickly and ergonomically than Haskell (IMHO).
This latest fever for LLMs simply confirms that people would rather do _anything_ other than program in a (not necessarily purely) functional language that has meta-programming facilities. I personally blame functional fixedness (psychological concept). In my experience, when someone learns to program in a particular paradigm or language, they are rarely able or willing to migrate to a different one (I know many people who refused to code in anything that did not look and feel like Java, until forced to by their growling bellies). The AI/LLM companies are basically (and perhaps unintentionally) treating that mental inertia as a business opportunity (which, in one way or another, it was for many decades and still is -- and will probably continue to be well into a post-AGI future).
Even before LLMs, there was a _lot_ of deception and cheating in university. I -- and I do not say this with pride -- used to write essays for my classmates for money. In my own defense, I needed the money. I also know that in addition to homework for money, many fraternities and sororities kept copies of prior exams and assignments, and getting access to these was one of the perks of membership. Knowing what kind of questions to expect (let alone the exact questions) can easily give someone a few extra IQ points for free.
Personally, I felt that the drive to automate the parts of the professors' workloads that mattered (i.e. teaching and grading and evaluation and research), only so that they can be given work that matters less the more they do it (i.e. publishing slightly different flavors of the same paper, to meet KPIs), was oddly perverse.
The multiple-choice test and the puzzle-solving test and really any standardized test can be exploited by any group that is sufficiently organized. This is also true in corporate interviewing where corporations think (or pretend) that they are interviewing an individual, whereas they are actually interviewing a _network_ of candidates who share details about the interviewers and the questions. I know people who got rejected in spite of getting all the interview questions correct (the theory is that nobody can do that well, so they must have had help from previously rejected/accepted candidates).
The word "trust" shares a root with the word "tree" and "truth" and "druid". Most exams and interviews are trying to speed-run trust-building (note that "verification" is from the latin word that means "true"). If trust and truth are analogous to "tree", then we are trying to speed-run the growth of a tree -- much like the orange tree, in the film, _The Illusionist_. And like the orange tree, it is a near-complete illusion, a ritual meant to keep the legal department and HR department happy.
The LLMs have simply made the corruption of academia accessible to _all_ students with an internet connection (EDIT: and instantaneous and cheap, unlike a human writer).
There has never been a shortcut to building trust. One cannot LLM their way into being a (metaphorical) druid.
I do not look forward to the Voight-Kampff tests that will come to dominate all aspects of online and asynchronous human interaction.
Note that, short of homework/classwork that _can't_ be gamed by an LLM (for some fundamental reason), even the high-quality honest students will be forced to cheat, so as to not be eclipsed by the actual low-quality cheating students[0].
I imagine that we may end up wrapping around to live in-person dialectics, as were standard in the time of Socrates and Parmenides[1]. If so, this should be fun.
[0]: If left unaddressed, we may see a bimodal distribution of great and terrible students graduating college, with those in between dropping out. If college is an attempt to categorize and rank a population, this would be a major fault in that mechanism.
[1]: Not to the exclusion of the other kinds of tests, writing is still important, critical even. But as a kind of verification-step, that should inform how much the academic community should trust the writing (I can imagine that all the writers here are experiencing stage-fright as they are reading these words).
1. This is a kind of fuzzer. In general it's just great to have many different fuzzers that work in different ways, to get more coverage.
2. I wouldn't say LLMs are "better" than other fuzzers. Someone would need to measure findings/cost for that. But many LLMs do work at a higher level than most fuzzers, as they can generate plausible-looking source code.
As someone on the SpiderMonkey team who had to evaluate some of Anthropic's bugs, I can definitely say that Anthropic's test cases were definitely far easier to assess than those generated by traditional fuzzers. Instead of extremely random and mostly superfluous gibberish, we received test cases that actually resembled a coherent program.
I didn't even read the piece but my bet is that fuzzers are typically limited to inputs whereas relying on LLMs is also about find text patterns, and a bit more loosely than before while still being statistically relevant, in the code base.
Fuzzers and LLMs attack different corners of the problem space, so asking which is 'qualitatively better' misses the point: fuzzers like AFL or libFuzzer with AddressSanitizer excel at coverage-driven, high-volume byte mutations and parsing-crash discovery, while an LLM can generate protocol-aware, stateful sequences, realistic JavaScript and HTTP payloads, and user-like misuse patterns that exercise logic and feature-interaction bugs a blind mutational fuzzer rarely reaches.
I think the practical move is to combine them: have an LLM produce multi-step flows or corpora and seed a fuzzer with them, or use the model to script Playwright or Puppeteer scenarios that reproduce deep state transitions and then let coverage-guided fuzzing mutate around those seeds. Expect tradeoffs though, LLM outputs hallucinate plausible but untriggerable exploit chains and generate a lot of noisy candidates so you still need sanitizers, deterministic replay, and manual validation, while fuzzers demand instrumentation and long runs to actually reach complex stateful behavior.
It's not really bad or not though. It's a more directed than the rest fuzzer. While being able to craft a payload that trigger flaw in deep flow path. It could also miss some obvious pattern that normal people don't think it will have problem (this is what most fuzzer currently tests)
Not all legal systems put the burden of proof on the accuser. In fact, many legal systems have indefinite detentions, in which the government effectively imprisons a suspect, sometimes for months at a time. To take it a step further, the plea-bargain system of the USA, is really just a method to skip the entire legal process. After all, proving guilt is expensive, so why not just strong-arm a suspect into confessing? It also has the benefit of holding someone responsible for an injustice, even if the actual perpetrator cannot be found. By my personal standards, this is a corrupt system, but by the standards of the legal stratum of society, those KPIs look _solid_.
By contrast, in Germany (IIRC), false confessions are _illegal_, meaning that objective evidence is required.
Many legal systems follow the principle of "innocent until proven guilty", but also have many "escape hatches" that let them side-step the actual process that is supposed to guarantee that ideal principle.
EDIT: And that is just modern society. Past societies have had trial by ordeal and trial by combat, neither of which has anything to do with proof and evidence. Many such archaic proof procedures survive in modern legal systems, in a modernized and bureaucratized way. In some sense, modern trials are a test of who has the more expensive attorney (as opposed to who has a more skilled champion or combatant).
There is no comment on whether LLMs/agents have been used. I feel like projects should explicitly say if they were _or_ were not used. There is no license file, and no copyright header either. This feels like "fauxpen-source": imagine getting LEX+YACC to generate a parser, and presenting the generated C code as "open-source".
This is just another way to throw binaries over the wire, but much worse. This has the _worst_ qualities of the GPL _and_ pseudo-free-software-licenses (i.e. the EULAs used by mongo and others). It has all the deceptive qualities of the latter (e.g. we are open but not really -- similar to Sun Microsystems [love this company btw, in spite of its blunders], trying to convince people that NeWS is "free" but that the cost of media [the CD-ROM] is $900), with the viral qualities of the former (e.g. the fruit of the poison tree problem -- if you use this in your code, then not only can you not copyright the code, but you might actually be liable for infringement of copyright and/or patents).
I would appreciate it if the contributor, mrconter11, would treat HN as an internet space filled with intelligent thinking people, and not a bunch of shallow and mindless rubes. (Please (1) explicitly disclose both the use and absence of use of LLMs -- people are more likely to use your software this way, and preserves the integrity of the open source ecosystem, and (2) share you prompts and session).
That is (slightly) reassuring (but the rest of his portfolio does not inspire confidence). Nevertheless, we should be required to disclose whether the code has been (legally) tainted or not. This will help people make informed decisions, and will also help people replace the code if legal consequences appear on the horizon, or if they are ready to move from prototype to production.
[0]: The pattern, or, as gamers would call it, the "meta", is that every ambitious person/entity wants to control as much of the economic/material surplus as possible. The most effective and efficient (effort per control) way of doing this is to make yourself into as much of a bottle-neck as humanly possible. In graph-theory this corresponds to betweenness-centrality, and you want to maximize that value. To put it in mundane terms, you want to be as much of a monopoly as you can be (Thiel is infamous for saying this, but it does check out, historically). To maximize betweenness, or to maximize monopoly, is to maximize how much society/economy depends on you. This is such a dominant strategy (game-theory term, but in modern gaming world, they might call this a "cheesy strat" -- which just means that the game lacks strategic variety, forcing players to hone that one strategy), that we even have some old laws (anti-trust, etc) designed to prevent it. And it makes a lot of sense: Standard Oil was reviled because everything in the economy either required oil or required something that did. 20th-century USA did a lot to mitigate this. It forced monopolies like ATT to fund general research like Bell Labs (still legendary) towards a public good (a kind of tax, but probably much more socially-beneficial). It also broke up the monopolies, and passed anti-profit laws (e.g. hospitals were not allowed to make a profit until 1978; I have seen in the last 10 years a tiny cancer clinic grow into a massive gleaming hospital -- a machine that transforms sickness and grief into Scrooge McDuck vaults of cash). This monopolistic tendency of the commercial sector, is a tendency towards centralization, which yields efficiency, sure, but also creates the conditions for control and rent-seeking and exploitation.
[1]: Much of the cloud-computing craze was similar in character (and also failed to deliver on some of its promises, such as reducing/replacing IT overhead (they just renamed IT to DevOps)). And Web2 itself was about creating and monopolizing a new kind of ad-channel and lead-generation-machine. There is a funny twist, that a capitalist society like the USA, has much more deeply rooted incentives to create a panopticon than communist states of the past ever did. Neither is pretty of course. The communists demanded conformity and loyalty, while the capitalists demand consumption and rent.
reply