Perfusion, Nvidia's solution for high storage demands of AI image generation

Nvidia researchers have developed a new AI image generation technique that enables highly customized text-to-image models with minimal storage requirements.

According to a paper published on arXiv, the proposed method, called “ Perfusion ,” can add new visual concepts to existing models using only 100KB of parameters per concept.

Source: Nvidia Research

As the paper’s authors describe it, Perfusion works by making “small updates to the internal representation of the text-to-image model.”

More specifically, it makes carefully calculated changes to the part of the model that connects text descriptions with generated visual features. Applying small parameterized edits to the criss-cross attention layers allows Perfusion to modify the way text inputs are converted to images.

So Perfusion didn’t completely retrain the text-to-image model from scratch. Instead, it slightly tweaked the mathematical transformations that turn words into pictures. This allows it to customize the model to generate new visual concepts without requiring much computing power or model retraining.

The perfusion method requires only 100 kb.

Perfusion achieves these results with two to five orders of magnitude fewer parameters than competing technologies.

While other approaches may require hundreds of megabytes to gigabytes of storage per concept, Perfusion requires only 100KB, comparable to a small image, text, or WhatsApp message.

This drastic reduction could make deploying highly customized AI art models much more feasible.

According to co-author Gal Chechik;

“Not only does infusion enable more accurate personalization at a fraction of the model size, but it also enables the use of more complex cues as well as combining separately learned concepts at inference time.”

The method can generate creative images, such as “teddy bear sailing in a teapot”, using the individually learned personalized concepts of “teddy bear” and “teapot”.

Source: Nvidia Research

Possibility for efficient personalization

Perfusion has the unique ability to personalize AI models using only 100KB per concept, opening up countless potential applications:

This approach paves the way for individuals to easily customize text-to-image models with new objects, scenes, or styles, eliminating the need for expensive retraining. Perfusion’s efficiency of 100KB parameter updates per concept allows models customized using this technology to be implemented on consumer devices, enabling on-device image creation.

One of the most compelling aspects of this technology is the potential it offers for sharing and collaboration around AI models. Users can share their personalized concepts as small attached files, avoiding the need to share cumbersome model checkpoints.

In terms of distribution, models tailored to specific organizations can be more easily disseminated or deployed at the edge. As the practice of text-to-image generation continues to become more mainstream, the ability to achieve such significant size reductions without sacrificing functionality will be critical.

However, it is worth noting that Perfusion primarily provides model personalization rather than full generative capabilities per se.

Restrictions and Releases

While promising, the technique does have some limitations. The authors note that key selections during training can sometimes overgeneralize a concept. More research is still needed to seamlessly combine multiple personalized ideas into a single image.

The authors note that Perfusion's code will be available on their project page, indicating an intention to publicly release the method in the future, possibly pending peer review and official research publication. However, the specifics of public availability remain unclear as the work is currently only posted on arXiv, a platform where researchers can upload papers before formal peer review and publication in journals/conferences.

While Perfusion’s code is not yet accessible, the plans outlined by the authors mean that this highly effective, personalized AI system could end up in the hands of developers, industry, and creators in due course.

As AI art platforms like MidJourney, DALL-E 2, and Stable Diffusion grow, technologies that allow for greater user control may be critical to real-world deployments. With clever efficiency improvements like Perfusion, Nvidia seems determined to maintain its edge in a rapidly evolving landscape.

#Nvidia  #图像生成