>it's also not using many of the complex and more efficient extraction approaches that have been used on GANs and such in prior research
Some links?
>most of their 175 million images comes from effectively "retrying" each prompt 500 times
Prompts from the most duplicated samples in the dataset, a really important aspect if you actually want to used this method in the wild, this is also one of the reasons why I said that this attack seems so implausible.
>they're usually much more targetted than this
Even if you target some images you would still need an absurd amount of luck if with the most duplicated sample you only get 109, we can be generous and think that with the whole dataset will have something like 200 matches, the probability of finding an image with a direct attack is still less than a million (even if you know the prompt) and we're not talking about a model trained on a deduped dataset.
>Is it implausible if they've done it in this paper?
Extracting images in the wild yes, the authors of the paper have access to the dataset, they could sort prompts and images based on their presence in the dataset and they have an incredible amount of computation to do so, generating 175 mln using a diffusion model is an extremely resource-intensive task.
Anyway, I don't think the point of this was to indicate that people can stumble on these incidents, but rather that it is possible. It's hard to see how this won't affect the ongoing suit.
In the case of Stable Diffusion yes the dataset publicly available but these types of attacks would make much more sense if the attacker wants to extract some private data.
Some links?
>most of their 175 million images comes from effectively "retrying" each prompt 500 times
Prompts from the most duplicated samples in the dataset, a really important aspect if you actually want to used this method in the wild, this is also one of the reasons why I said that this attack seems so implausible.
>they're usually much more targetted than this
Even if you target some images you would still need an absurd amount of luck if with the most duplicated sample you only get 109, we can be generous and think that with the whole dataset will have something like 200 matches, the probability of finding an image with a direct attack is still less than a million (even if you know the prompt) and we're not talking about a model trained on a deduped dataset.