Well, that's the actual issue, isn't it? If we can't get a model to refuse to give dangerous information, how are we going to get it to refuse to give dangerous information without a warning label?
Well, that's the actual issue, isn't it? If we can't get a model to refuse to give dangerous information, how are we going to get it to refuse to give dangerous information without a warning label?