Now imagine you're an expert witness on the stand in the case that eventually settles this. I don't think this will be persuasive even if it's technically sound. How you end up with a copyrighted work doesn't seem relevant: the model still produced something eerily similar to a copyrighted work.
"Copyright laundering" will be the phrase of the era. Throw Picasso in with a thousand reproductions, wash it with DeviantArt, get Picasso back out. Does it matter than it's algorithmically derived rather than a stroke-by-stroke reproduction? There's probably already ample case law around reproductions to deal with this.
It's technically sound though. Let me give you an example.
y = f(x) = x + 2
This is what's memorized. But with that equation you can plug in a specific x. to get:
3 = f(1) = 1 + 2
The training set here would be (1,3). What is memorized is y = f(x) = x + 2. You can literally see that (1,3) is NOT in the model EVEN though that model CAN produce a (1,3) given the right input.
I think the technical part of this is sound. People will just take the face value explanation which is I see a 3! therefore a THREE was memorized. But technically a 3 WAS not memorized. This is categorically true.
You are Right. The persuasive part of this argument is not very good though as you can see by this thread.
"Copyright laundering" will be the phrase of the era. Throw Picasso in with a thousand reproductions, wash it with DeviantArt, get Picasso back out. Does it matter than it's algorithmically derived rather than a stroke-by-stroke reproduction? There's probably already ample case law around reproductions to deal with this.