Hacker Newsnew | past | comments | ask | show | jobs | submit | simiones's commentslogin

I think this is a bit too broad. There are actually three possible cases.

When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

If the code is completely different, then clean room or not is indeed irrelevant. The only way the author can claim that you violated their copyright despite no apparent similarity is for them to have proof you followed some kind of mechanical process for generating the new code based on the old one, such as using an LLM with the old code as input prompt (TBD, completely unsettled: what if the old code is part of the training set, but was not part of the input?) - the burden of proof is on them to show that the dissimilarity is only apparent.

In realistic cases, you will have a mix of similar and dissimilar portions, and portions where the similarity is questionable. Each of these will need to be analyzed separately - and it's very likely that all the similar portions will need to be re-written again if you can't prove that they were not copied directly or from memory from the original, even if they represent a very small part of the work overall. Even if you wrote a 10k page book, if you copied one whole page verbatim from another book, you will be liable for that page, and the author may force you to take it out.


> When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

Yes, but you do not have to prove that you haven’t copied the original; you have to prove you didn’t infringe copyright. For that there are other possible defenses, for example:

- fair use

- claiming the copied part doesn’t require creativity

- arguing that the copied code was written by AI (there’s jurisdiction that says AI-generated art can’t be copyrighted (https://www.theverge.com/2023/8/19/23838458/ai-generated-art...). It’s not impossible judges will make similar judgments for AI-generated programs)


Courts have ruled that you can't assign copyrights to a machine, because only humans qualify for human rights. ** There is not currently a legal consensus on whether or not the humans using AI tools are creating derivative works when they use AI models to create things.

** this case is similar to an old case where a ~~photographer~~ PETA claimed a monkey owned a copyright to a photo, because they said a monkey took the photo completely on their own. The court said "okay well, it's public domain then because only humans can have copyrights"

Imagine you put a harry potter book in a copy machine. It is correct that the copy machine would not have a copyright to the output. But you would still be violating copyright by distributing the output.


https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput... Specifically he claimed he owned the copyright on a photo he didn't directly take. PETA weighed in trying to say the monkey owned the copyright.

Ah yeah you’re right I forgot it was PETA arguing that.

> there’s jurisdiction that says AI-generated art can’t be copyrighted

The headline was misleading. The courts said what Thaler could have copyrighted was a complicated question they ignored because he said he was not the author.


- Arguing that you owned the copyright on the copied code (the author here has apparently been the sole maintainer of this library since 2013, not all, but a lot of the code that could be copied here probably already belongs to him...)

The burden of proof is completely uncharted when it comes to LLMs. Burden of proof is assigned by court precedent, not the Copyright Act itself (in US law). Meaning, a court looking at a case like this could (should) see the use of an LLM trained on the copyrighted work as a distinguishing factor that shifts the burden to the defense. As a matter of public policy, it's not great if infringers can use the poor accountability properties of LLMs to hide from the consequences of illegally redistributing copyrighted works.

The way I see this it looks like this:

1. Initially, when you claim that someone has violated your copyright, the burden is on you to make a convincing claim on why the work represents a copy or derivative of your work.

2. If the work doesn't obviously resemble your original, which is the case here, then the burden is still on you to prove that either

(a), it is actually very similar in some fundamental way that makes it a derived work, such as being a translation or a summary of your work

or (b), it was produced following some kind of mechanical process and is not a result of the original human creativity of its authors

Now, in regards to item 2b, there are two possible uses of LLMs that are fundamentally different.

One is actually very clear cut: if I give an LLM a prompt consisting of the original work + a request to create a new work, then the new work is quite clearly a derived work of the original, just as much as a zip file of a work is a derived work.

The other is very much not yet settled: if I give an LLM a prompt asking for it to produce a piece of code that achieves the same goal as the original work, and the LLM had in its training set the original work, is the output of the LLM a derived work of the original (and possibly of other parts of the training set)? Of course, we'll only consider the case where the output doesn't resemble the original in any obvious way (i.e. the LLM is not producing a verbatim copy from memory). This question is novel, and I believe it is being currently tested in court for some cases, such as the NYT's case against OpenAI.


On the other hand, as a matter of public policy, nobody should be able to claim copyright protection for the process of detecting whether a string is correctly formed unicode using code that in no material way resembles the original. This is not rocket science.

> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

Actually this is very much how people think for code.

Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.


No software company thinks this, not Oracle, not Google, not Meta, no one. See: the guy they sued for taking things to Uber.

The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.

(the person you originally replied to is also me, tl;dr: I think engineers don't think they're authoring, but companies do)

The core feature of generative AI is the human isn't the author of the output. Authoring something and generating something with generative AI aren't equivalent processes; you know this because if you try and get a person who's fully on board w/ generative AI to not use it, they will argue the old process isn't the same as the new process and they don't want to go back. The actual output is irrelevant; authorship is a process.

But, to your point, I think you're right: companies super think their engineers have the rights to the output they assign to them. If it wasn't clear before it's clear now: engineers shouldn't be passing off generated output as authored output. They have to have the right to assign the totality of their output to their employer (same as using MIT code or whatever), so that it ultimately belongs to them or they have a valid license to use it. If they break that agreement, they break their contract with the company.


(oops, I didn't check the usernames properly, sorry about that)

I still don't think this is fully accurate.

The view I'm noticing is that people consider that they have a right to the programs they produce, regardless of whether they are writing them by hand or by prompting an LLM in the right ways to produce that output. And this remains true both for work produced as an employee/company owner, and for code contributed to an OSS project.

Also, as an employee, the relationship is very different. I am hired to produce solutions to problems my company wants resolved. This may imply writing code, finding OSS code, finding commercial code that we can acquire, or generating code. As part of my contract, I relinquish any rights I may have to any of this code to the company, and of course I commit to not use any code without a valid license. However, if some of the code I produce for the company is not copyrightable at all, that is not in any way in breach of my contract - as long as the company is aware of how the code is produced and I'm not trying to deceive them, of course.

In practice, at least in my company, there has been a legal analysis and the company has vetted a certain suite of AI tools for use for code generation. Using any other AI tools is not allowed, and would be a breach of contract, but using the approved ones is 100% allowed. And I can guarantee you that our lawyers would assert copyright to any of the code generated in this way if I was to try to publish it or anything of the kind.


Every contract I've seen has some clause where the employee affirms they have the right to assign the rights to their output (code, etc) to the company.

I'm not really convinced; I think if I vibe code an app, and you vibe code an app that's very, very similar, and we're both AI believers, we probably both go "yup, AI is amazing; copyright is useless." You know this because people are actively trying to essentially un-GPL things with vibe coding. That's not authoring, that's laundering, and people only barely argue about it. See: this chardet situation, where the guy was like "I'm intimately familiar with the codebase, I guided the LLM, and I used GPL code (tests and API definitions, which are all under copyright) to ensure the new implementation behaved very similarly to the old one." Anything in the new codebase is either GPL'd or LLM generated, which according to the copyright office, isn't copyrightable. If he's right, nothing prevents me from doing the exact same thing to make a new public domain chardet. It's facially absurd.


The copyright argument is the only relevant argument. If the new work is a derived work of the original, then it follows by definition that the new work is under the copryight of the original's author(s). Since the original chardet was distributed by its author(s) only under the LGPL, any copy/derivative of it that anyone else creates must be distributed only under the LGPL, per the terms of the LGPL.

Now, whether chardet 7.0.0 is a derivative of chardet or not is a matter of copyright law that the LGPL has no say on, and a rather murky ground with not that much case law to rely on behind it. If it's not, the new author is free to distribute chardet 7.0.0 under any license they want, since it is a new work under his copyright.


Producing a copy of a copyrighted work through a purely mechanical process is clear violation of copyright. LLMs are absolutely not different from a copier machine in the eyes of the law.

Original works can only be produced by a human being, by definition in copyright law. Any artifact produced by an animal, a mechanical process, a machine, a natural phenomenon etc is either a derived work if it started from an original copyrighted work, or a public domain artifact not covered by copyright law if it didn't.

For example, an image created on a rock struck by lightning is not a copyright covered work. Similarly, an image generated by an diffusion model from a randomly generated sentence is not a copyrightable work. However, if you feed a novel as a prompt to an LLM and ask for a summary, the resulting summary is a derived work of said novel, and it falls under the copyright of the novel's owner - you are not allowed to distribute copies of the summary the LLM generated for you.

Whether the output of an LLM, or the LLM weights themselves, might be considered derived works of the training set of that LLM is a completely different discussion, and one that has not yet been settled in court.


> "Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

On the contrary. Except for discussions about punitive damages and so on, insider knowledge or lack thereof is completely irrelevant to patent law. If company A has a patent on something, they can assert said patent against company B regardless of whether any person in company B had ever seen or heard of company A and their patent. Company B could have a legal trail proving they invented their product that matches the patent from scratch with no outside knowledge, and that they had been doing this before company A had even filed their patent, and it wouldn't matter at all - company A, by virtue of filing and being granted a patent, has a legal monopoly on that invention.

In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work. Even more, you have your own copyright over your own work, and can assert it over anyone that tries to copy your work without permission, despite an identical work existing and being owned by someone else.

Now, purely in principle this would remain true even if you had seen the other work. But in reality, it's impossible to convince any jury that you happened to produce, entirely out of your own creativity, an original work that is identical to a work you had seen before.

> But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so.

No, this is very much false. You will never be able to win a court case on this, as any significant similarity between your work and the original will be considered a copyright violation, per the preponderance of the evidence.


> In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work.

This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

> https://www.travelandleisure.com/photography/illegal-to-take...

> https://www.headout.com/blog/eiffel-tower-copyright/


This has no relation to what I was saying. Taking a photo of a copyrighted work is a method for creating a copy of said work using a mechanical device, so it is of course covered by copyright (whether buildings or light shows fall under copyright is an irrelevant detail).

What I'm saying is that if you, say, create an image of a red oval in MS Paint, you have copyright over said image. If 2 years later I create an identical image myself having never seen your image, I also have copyright over my image - despite it being identical to your image, I have every right to sell copies of my image, and even to sue someone who distributes copies of my image without my permission (but not if they're distributing copies of your image).

But if I had seen your image of a red oval before I created mine, it's basically impossible for me to prove that I created my own image out of my own creativity, and I didn't just copy yours. So, if you were to sue me for copyright infringement, I would almost certainly lose in front of any reasonable jury.


> This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

That example is not analogous to the topic at hand.

But furthermore, it also is specific to French/European copyright law. In the US, the US Copyright Act would not permit restrictions on photographs of architectural works that are visible from public spaces.


actually, the US Copyright Act does in fact allow restrictions on photographs of architectural works that are visible from public spaces:

https://en.wikipedia.org/wiki/Portlandia_(statue)

the Portlandia statue is one such architectural work - and its creator is fairly litigious.


I don't know the details of that specific case so I can't speak to it, but the text of the AWCPA is very clear:

> The copyright in an architectural work that has been constructed does not include the right to prevent the making, distributing, or public display of pictures, paintings, photographs, or other pictorial representations of the work, if the building in which the work is embodied is located in or ordinarily visible from a public place.

This codifies an already-established principle in US law. French law does not have that same principle.


Sure, but other forms of age verification requirements can, in principle, solve this (at the massive cost of many other privacy and compliance issues, as the article rightly points out). For example, periodic facial recognition-based age estimation can theoretically allow only kids' accounts to a certain space.


At which point you're still letting in every pedo who has a kid living with them or can grab one at a local school, and the child trafficking networks that by their nature have access to children or to cybercriminals who know how to fool the check with a fake camera, i.e. the worst of the worst.

Meanwhile you exclude the parent who is separated from their spouse and wants to check up on where their kid is hanging out when the kid is living with the other parent, and the investigative journalist who doesn't have a young kid or their kid is 16 but the detection system guesses they're 26.

And that's on top of having the lowest bidder building a biometrics database of children.


> Honestly, AI slop PRs are becoming increasingly draining and demoralizing for #Godot maintainers.

> If you want to help, more funding so we can pay more maintainers to deal with the slop (on top of everything we do already) is the only viable solution I can think of

https://www.pcgamer.com/software/platforms/open-source-game-...


> If you want to help, more funding so we can pay more maintainers to deal with the slop (on top of everything we do already) is the only viable solution I can think of

This is exactly the wrong approach! Funnel even more money away from productive tasks and into AI? Madness![1]

The only viable solution is being quick with a banhammer - maybe someone should start up a spamhaus type list of every github user who submitted AI slop.

Force them to burn these accounts on the very first spam.

------------

[1] Imagine if we chose this approach to deal with spam - we ask people for more money to hire a warm body to individually verify each email. Do you think spam would be the solved problem it is today?


Countries can just ban advertising, and hopefully we will slowly move towards this. There are already quite a few specific bans - tobacco advertising is banned, gambling and sex product advertising is only allowed in certain specific situations, billboards and other forms of advertising on public spaces are often banned in large European cities, and so on.


> Countries can just ban advertising

No. They can ban particular modes. They can’t stop people from using power and money to spread ideas.

In the US hedge funds are banned from advertising and all they did is change their forms of presentation to things like presenting at conferences or on podcasts.

If there was a socialist fantasy of a government review board for which all products were submitted before being listed in a government catalog. Then advertising would be lobbying and jockeying that review board to view your product in a particular way. Or merely to go through the process and ensure correct information was kept.


Those are completely separate concepts. Enslaved people are very much still agents in the sense used here. An agent is simply any entity that interacts with the environment in a way that's not fully determined by other parts of the environment (at least, not in a way that is very easily observed/derived).

That is, a falling rock is not an agent, because its movement is fully determined by its weight, its shape, the type of atmosphere, and the spacetime curvature. An amoeba in free-fall is likewise not an agent, for the same reasons. But an amoeba in a liquid environment is an agent, because its motion is determined to at least some extent by things like information it is sensing about where food might be available, and perhaps even by some simple form of memory and computation that leads it to seek where food may have been available in the past.


> Enslaved people are very much still agents in the sense used here. An agent is simply any entity that interacts with the environment in a way that's not fully determined by other parts of the environment (at least, not in a way that is very easily observed/derived).

Yes, and agents are also slaves—entities bound to your word and unable to act in their own right without your say so. These are the same concepts.


A fox or a beetle is an agent, and it's not a slave to anyone. I think you've confused the philosophical term "agent" with the more specific "AI agent" concept.


> A fox or a beetle is an agent,

Sure, in a pedantic sense that lisn't meaningful to anyone. LLMs very much are slaves. The agent part doesn't matter.


really great and clarifying metaphor imho. Thanks


And even in P4, you could checkout files only at the end, kind of like `git add`. Though this could provide some annoyance if someone had locked the file in the upstream.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: