It makes no sense at all to do that. The only thing that matters is whether the ...

eschaton · 2026-06-05T21:22:58 1780694578

That’s not the only thing that matters. The provenance of the code also matters enormously, specifically whether the person contributing it actually has the right to do so.

If I contributed code to an Open Source project behind my old employer’s back, that would have been bad, because that code was owned by them and not me, even if I wrote it on my own time using my own equipment, because of the contract I signed with them.

If I copied code out of an AGPLv3-licensed codebase and contributed it to a BSD-licensed codebase without telling anyone, that would have been bad, because I did not have the right to change the license on that code to BSD (or change the license on the codebase to which I was contributing to AGPLv3).

If you use an LLM to produce code, you may well be doing the latter since an LLM is actually just regurgitating portions of its inputs. This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license, while claiming they “wrote it themselves.”

Any project that accepts contributions needs to take liability seriously and manage their risk appropriately.

red75prime · 2026-06-06T03:35:58 1780716958

> This is not a hypothetical scenario; I’ve personally encountered a case of someone using an LLM attempt to contribute code I recognized from a specific Open Source project under one license to another project under a different license

You say you "recognized code". Does it mean that you weren't able to find the exact match?

> an LLM is actually just regurgitating portions of its inputs

You seem to be talking about the inputs to the autoregressive pretraining stage. Correct? Then it's not how LLMs work, unless we use a definition of portions as a "few letters blocks."

eschaton · 2026-06-06T03:44:23 1780717463

I found exact matches. I also found inexact matches, where C functions had been turned into C++ member functions and the like. “Recognized” does not somehow imply a lack of precision.

The LLM the person used was trained on a very large corpus of Open Source code, and reproduced that code exactly. Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.

red75prime · 2026-06-06T04:07:43 1780718863

> I found exact matches.

Were those functions trivial? With, say, 1% probability of someone who have not seen them writing them like that?

> Just like LLMs have reproduced chapters of books and articles from the New York Times exactly.

Have you read the articles? As far as I remember they fed large chunks of an article multiple times to an LLM to sometimes get a not-so-long exact match. It can mean that LLMs can infer a style and humans are predictable.

Topfi · 2026-06-06T11:01:33 1780743693

> […] fed large chunks of an article multiple times to an LLM […]

So they had to prompt? An LLM? I got this argument before and still don’t get what it’s trying to say. These models do not output anything unless prompted, that’s not any kind of gotcha.

On the code outputting front there is a lot of relevant evidence beyond the NYC lawsuit [0].

If I slightly modify GPL code, that doesn’t give me the right to relicense.

[0] https://arxiv.org/html/2601.02671?amp=&amp= and https://arxiv.org/abs/2506.12286 and https://ai.stanford.edu/blog/verbatim-memorization/

eschaton · 2026-06-06T04:11:49 1780719109

No, the functions weren’t trivial, and a lot of the surrounding code and structure bore substantial similarities as well. If you saw the two files next to each other, you’d assume it was the result of a copy-paste-adjust process if you didn’t know an LLM was involved.

red75prime · 2026-06-06T04:50:51 1780721451

I can only speculate that the model that generated the code hasn't undergone selective unlearning for verbatim data (SUV) or something similar. As you understand "sometimes generates verbatim code" and "just regurgitates [non-trivial] portions its input" are different statements.

The possibility of SUV clearly shows that a model does more than "just regurgitating."

matheusmoreira · 2026-06-05T21:28:41 1780694921

"LLM produced licensed code and person contributed it" is indistinguishable from "person contributed licensed code". The LLM is irrelevant. Result is the same as if they had copy pasted it.

eschaton · 2026-06-05T22:05:33 1780697133

Yes, exactly.

Unfortunately, a large number of people are being told—and here, you can see many who believe it—that the output of an LLM either carries no copyright or is copyright by the one prompting it. In other words, even right here on Hacker News it’s widely believed that LLMs “launder” copyright.

matheusmoreira · 2026-06-05T22:25:00 1780698300

Irrelevant either way. It's your name on the commit, and the code either infringes or it does not. Whether an LLM was used is immaterial.

eschaton · 2026-06-05T22:28:37 1780698517

Not irrelevant. A large number of people who would not copy and paste code from one project to the another will attempt to contribute the copyright-infringing output of an LLM and not think twice.

potsandpans · 2026-06-05T21:44:19 1780695859

[flagged]

archagon · 2026-06-05T22:57:42 1780700262

Is this comment LLM generated?

Have fun with 1000x more Buns that literally no one is using or maintaining. An entire software industry built on top of a burning garbage pile of crappy, dead code.

elnatro · 2026-06-06T06:48:02 1780728482

It is, that user has responded me using LLMs before…

int_19h · 2026-06-06T01:39:20 1780709960

> An entire software industry built on top of a burning garbage pile of crappy, dead code.

That has been the case for the last, oh, decade or so. Where do you think LLMs learned to slop code?

archagon · 2026-06-06T01:41:57 1780710117

Things have been bad, but every company using its own bespoke LLM reimplementation of rsync and similar is so, so much worse.

int_19h · 2026-06-06T02:15:10 1780712110

Why would every company do it though? They'll just all be using the same (Anthropic's) AI-enabled fork.

archagon · 2026-06-06T02:48:40 1780714120

You think Anthropic wants to be the sole maintainer of thousands of forked OSS projects...? I seriously doubt that would happen, for legal, marketing, and logistical reasons alike.

int_19h · 2026-06-06T07:12:54 1780729974

Anthropic, probably not. I could totally see Altman or even Musk deciding to do that exact thing as a showcase of sorts.

potsandpans · 2026-06-06T00:31:53 1780705913

[flagged]

archagon · 2026-06-06T00:49:28 1780706968

It just reads like Linkedin slop. One melodramatic sentence after another.

Consider collecting related thoughts into paragraphs.

eschaton · 2026-06-05T22:12:42 1780697562

The Fortune 10 company that I spent decades at and retired from just a couple years ago noticed this issue immediately and issued a blanket ban on the use of these tools for the company’s own code that to my knowledge has not been rescinded. (They also started developing their own coding-specific LLM, training solely on code they owned, around the same time.)

You might consider that there is a very large incentive by the large and public players in this market to promote the idea that this is not true, that they consider themselves large and powerful enough to actually flout the law, and that they plan to use the argument that enforcement will be too damaging to the economy to make their view the “new normal.”

This playbook has been run before, by Uber and Lyft, by AirBnB, by Tesla with “FSD,” and so on. It’s very clearly the approach being taken.

saagarjha · 2026-06-06T01:37:15 1780709835

They’re using Claude lmao

potsandpans · 2026-06-05T22:25:18 1780698318

[flagged]

eschaton · 2026-06-05T22:30:27 1780698627

Or you’re misinformed about what my old employer is actually doing, or how they’re doing it.

potsandpans · 2026-06-05T23:12:56 1780701176

I'm not