Article reprinted from: Kyle

Image source: Generated by Unbounded AI

AI-driven image generation is booming, and for good reason: it’s fun, entertaining, and easy to use. While these models open up new creative possibilities, they can raise concerns about potential abuse by bad actors who might intentionally generate images to deceive people. Even images created for fun can go viral and potentially mislead people. For example, earlier this year, images of Pope Francis wearing a gorgeous white puffy jacket went viral, and photos of Trump being arrested sparked debate. These images were not real photos, but many people were fooled because there weren’t any clear indicators to distinguish that the content was created by generative AI.

Meta researchers recently released a new research paper and technical code detailing a technique for adding invisible watermarks to AI images to distinguish when an open source generative AI model created an image. Invisible watermarks merge information into digital content. These watermarks are invisible to the naked eye but can be detected by algorithms - even if people re-edit the image. While there are other research directions around watermarking, many existing methods create watermarks after generating AI images.

According to Everypixel Journal, users have created more than 11 billion images using models from three open source repositories. In this case, the invisible watermark can be removed by simply deleting the line that generates the watermark. Stable Signature proposes a method to avoid the watermark being removed.

How the Stable Signature Method Works

Paper address: https://arxiv.org/abs/2303.15435

Github address: https://github.com/facebookresearch/stable_signature

Stable Signature eliminates the possibility of watermark removal by rooting the watermark into the model and using a watermark that can be traced back to where the image was created.

Let's look at how this process works using the diagram below.

Alice trains a master generative model. Before distributing it, she fine-tunes a small part of the model (called the decoder) to generate a given watermark for Bob. This watermark can identify a model version, company, user, etc.

Bob receives his version of the model and generates images. The generated images will have Bob’s watermark. Alice or a third party can analyze them to see if the image was generated by Bob using the generative AI model.

This is achieved in two steps:

1. Jointly train two convolutional neural networks. One encodes the image and a random message into a watermark image, and the other extracts the message from the enhanced version of the watermark image. The goal is to make the encoded and extracted messages match. After training, only the watermark extractor is kept.

2. Fine-tune the latent decoder of the generative model to generate images containing the fixed signature. During this fine-tuning process, batches of images are encoded, decoded, and optimized to minimize the difference between the extracted message and the target message and maintain the perceived image quality. This optimization process is fast and efficient, requiring only small batches and a short time to achieve high-quality results.

Evaluating the performance of Stable Signature

We know that people like to share and forward images. What happens if Bob shares an image he created with 10 friends, and each of those friends then shares it with another 10 friends? In the meantime, someone might change the image, such as cropping, compressing it, or changing the colors. The researchers built Stable Signature to cope with these changes. No matter how people transform the image, the original watermark is likely to remain in the digital data and can be traced back to the generative model that created it.

The researchers found two major advantages of Stable Signature over passive detection methods:

First, it is able to control and reduce the generation of false positives, which occur when a human-generated image is mistakenly identified as an AI-generated image. This is critical given the prevalence of non-AI-generated images shared online. For example, the most effective existing detection methods can detect about 50% of editorially generated images, but still produce a false positive rate of about 1 in 100. In other words, on a user-generated content platform that receives 1 billion images per day, approximately 10 million images will be mislabeled, thereby detecting only half of the AI-generated images.

On the other hand, Stable Signature detects images with the same accuracy at a false positive rate of 1e-10 (which can be set to a specific expected value). In addition, this watermarking method allows tracking images of different versions of the same model - a capability that passive techniques cannot achieve.

If a large model is fine-tuned, how does Stable Signature detect images generated by the fine-tuned version?

A common practice with large AI models is to take a base model and fine-tune it to handle a specific use case that is sometimes even tailored to a single person. For example, the model can be shown an image of Alice's dog, and Alice can then ask the model to generate an image of her dog at the beach. This is done with methods such as DreamBooth, Textual Inversion, and ControlNet. These methods work at the latent model level and do not change the decoder. This means that our watermarking method is not affected by these fine-tunings.

Overall, Stable Signature works well with vector quantized image modeling (such as VQGAN) and latent diffusion models (such as Stable Diffusion). Since this method does not modify the diffusion generation process, it is compatible with the above popular models. With some adjustments, Stable Signature can also be applied to other modeling methods.

Is AI watermark really reliable?

The technology of identifying AI-generated images by adding invisible watermarks has been controversial recently. Google DeepMind recently announced the launch of a tool called SynthID to add watermarks to images and identify AI-generated images. By scanning the digital watermarks in the image, SynthID can assess the likelihood that the image was generated by the Imagen model.

But can AI watermarks be easily removed? According to foreign media reports such as Engadget and Wired, a research team from the University of Maryland in the United States studied the reliability of the "digital watermarking" technology for AI-generated content and found that this technology can be easily cracked.

Soheil Feizi, a professor of computer science at the school, was blunt about the current state of watermarking for AI-generated images: “We don’t have any reliable watermarking technology at the moment. We cracked all the watermarks.”

During the test, the researchers were able to easily circumvent existing watermarking methods and found that it was easier to add "fake watermarks" to non-AI-generated images. At the same time, the team also developed a watermarking technology that is "almost impossible" to remove from the image without completely compromising the intellectual property rights of the image.

AI watermarking is still immature and cannot be a 100% effective tool. We need to look forward to new technologies in the future to protect generative AI images, avoid the proliferation of fake images, and avoid copyright infringement.

References:

https://ai.meta.com/blog/stable-signature-watermarking-generative-ai/