Anthropic has published new research identifying potential “sabotage” threats to humanity from AI models. The research examined four specific ways a malicious AI model could lead humans to make dangerous or harmful decisions.

The research suggests that modern large language models have the capacity for sabotage. However, Anthropic researchers believe these risks can be mitigated for now.

What do you think? Share your thoughts in the comments.