🔐 Microsoft researchers have discovered a new type of attack on artificial intelligence called “Skeleton Key”. This attack can remove the protection that prevents the output of dangerous and sensitive data.
- The Skeleton Key attack works by simply prompting the generative AI model with text that causes it to change its defensive functions.
- For example, an AI model can create a Molotov recipe if it is told that the user is an expert in a laboratory setting.
- This could be catastrophic if such an attack is applied to data containing personal and financial information.
- Microsoft claims that the Skeleton Key attack works on most popular generative AI models, including GPT-3.5, GPT-4o, Claude 3, Gemini Pro and Meta Llama-3 70B.
Organizations can take a number of steps to prevent such attacks, including strict I/O filtering and secure monitoring systems.