Getting ML to reliably do something specific like flag an image as inappropriate...

anonylizard · on May 2, 2023

Are you sure about that? We haven't seen what GPT-4 multimodal can do in the wild, it can even take into context the full conversation history in addition to just the images. If it can understand visual jokes easily, are you sure it can't detect CSAM?

We can also reliably GENERATE inappropriate content now, by simply adding a 'nsfw' tag to Stable diffusion, it flips a normal image to an inappropriate one. It doesn't sound very difficult to reverse this.

Also, for these services, you don't need it to be perfect. If even the flagging accuracy goes up significantly, that's a lot fewer human workers to review it.

The AI ecosystem as a whole is also booming massively regarding hardware, datasets, talent, software infrastructure. So that makes development in traditional ML faster.

waboremo · on May 2, 2023

As someone who is very optimistic about GPT, you are utterly misunderstanding what content moderation teams deal with if you think GPT can accurately assess a majority of their cases.

The accuracy will not go up significantly if humans are already struggling, and I'm not talking about the mental aspects of the job, I'm talking about being unable to determine if the case is accurately violating TOS or not.

This is going to get even more difficult, for both humans and AI, when content is being generated at alarmingly fast rates by AI. So even with the use of AI to combat AI, there is still going to be a gigantic tidal wave of questionable shit to go through.

ryanjshaw · on May 3, 2023

The options aren't 100% human vs. 100% AI, though?

I'm no expert so maybe you can clarify why high accuracy/reliability is that important for an initial analysis by AI? I would expect the vast majority of reports are straightforward matters?

There's a human poster involved, who can trigger a review if they disagree with the decision, and that review can be attended to by a much smaller pool of humans.