Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>it's also not using many of the complex and more efficient extraction approaches that have been used on GANs and such in prior research

Some links?

>most of their 175 million images comes from effectively "retrying" each prompt 500 times

Prompts from the most duplicated samples in the dataset, a really important aspect if you actually want to used this method in the wild, this is also one of the reasons why I said that this attack seems so implausible.

>they're usually much more targetted than this

Even if you target some images you would still need an absurd amount of luck if with the most duplicated sample you only get 109, we can be generous and think that with the whole dataset will have something like 200 matches, the probability of finding an image with a direct attack is still less than a million (even if you know the prompt) and we're not talking about a model trained on a deduped dataset.



Is it implausible if they've done it in this paper?

This paper seems to answer the question of, "can SD, even just in theory, produce copyright-infringing work?" with "yes, it can."

For other images that are a product of thousands - if not millions - of source images, it becomes murkier.


>Is it implausible if they've done it in this paper?

Extracting images in the wild yes, the authors of the paper have access to the dataset, they could sort prompts and images based on their presence in the dataset and they have an incredible amount of computation to do so, generating 175 mln using a diffusion model is an extremely resource-intensive task.


I believe everyone has access to their dataset, no? https://laion.ai/blog/laion-5b/

Anyway, I don't think the point of this was to indicate that people can stumble on these incidents, but rather that it is possible. It's hard to see how this won't affect the ongoing suit.


In the case of Stable Diffusion yes the dataset publicly available but these types of attacks would make much more sense if the attacker wants to extract some private data.


Ah, I understand what you're getting at. True.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: