Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's not like Meta can remove these books from the training set without retraining from scratch (or at least the last checkpoint before they were used).

They probably can:

https://github.com/zjunlp/EasyEdit

> I wonder if this is going to cause issues down the road.

There are some popular Stable Diffusion models, being run in small businesses, that I am certain have CSAM in them because they have a particular 4chan model in their merging lineage.

... And yet, it hasn't blown up yet? I have no explanation, but running "illegal" weights seems more sustainable than I would expect.



I’ve been wondering when the landmark moral panic would start against Civit.AI and the coomer crowd. People have no idea just how much porn is being produced by this stuff. One of the top textual inversions right now is a… age slider… (https://civitai.com/models/65214/age-slider) ewww. It’s also extremely well rated and reviewed on there. I’m terrified at the impending backlash because depending on what happens the party going on in AI could end


People have been saying this about underage hand drawn hentai forever, but its still around.

Not that I am disagreeing with you. What I find particularly disturbing are the paid services for this.

Also, I have seen 2 seperate OnlyFans pimps ask for help in a text generation chatroom. Something about automating "private" texting from their "girls."


It’s trivial to use these methods to produce real looking images, or even stuff in the likeness of real people…


Yeah. I did a fine tuned model of my daughter and niece and I definitely have to put in “sexy, naked,” and the like in the negative prompt when using them.

I don’t think society is going to have a hissyfit until some app comes along that makes it super easy for people to train good models locally on people and then generate whatever they want. That day’s coming really soon though.


There are tons of web services for this. They are just obscure and distributed enough to avoid public ire.

The pieces to do local LORA training are all there, but honestly the tyranny of CUDA is the biggest blocker for the average person.


Sure, but it's still not super user friendly. You upload photos, get a 2 GB checkpoint file that you run on some obscure, sometimes hard to install programs.

I know there was a phone app that did a limited thing where they gave you profile images and they made bank. I'm a little surprised nobody has tried going whole hog, if the app stores would even allow it.


That is not at all the same thing as removing the books.


> They probably can:

No, actually they probably can’t. There is no verifiable way to remove the data from the model apart from completely removing all instances of information from the training data. The project you linked only describes a selective finetuning approach.



Until you get models with completely disentangled feature spaces such that you know that the influence of a piece of data is completely removed (at the limit this is something like an embedding DB), there is absolutely no way you can claim you’ve removed the data from the model.

At most, these efforts will amount to data laundering where it will be impossible to prove that a piece of data was used to train the model, not provide conclusive proof that it was removed.


Which means we are probably at least 5-10 years away from verifiable action that a court of law will recognize.


This assumes it's possible. I naively assume it's not, in a way that doesn't harm the model, beyond the content of the book.


They can probably prevent LLaMA from spitting out verbatim quotes from the books well enough to make proof difficult.

... But yeah, fundamentally the only way to throw out the books is to throw out the weights.


that is quite the spicy claim




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: