Researchers at the Beijing Academy of Artificial Intelligence have developed Omnigen, a new AI model that may be an all-in-one source for image creation. Unlike previous models that required users to load separate image generators, controlnets, IPadapters, and inpainting models, Omnigen functions as a comprehensive creative suite that handles everything from basic image editing to complex visual reasoning tasks within a single, streamlined framework.
Omnigen relies on two core components: a Variational Autoencoder that deconstructs images into their fundamental building blocks, and a transformer model capable of processing varied inputs with remarkable flexibility. The model has been trained on a dataset of one billion images, allowing it to handle tasks ranging from text-to-image generation to sophisticated photo editing and in-painting.
One of the most striking features of Omnigen is its ability to understand context. When prompted to identify a place to wash hands, for example, it instantly recognizes and highlights sinks in images, showcasing a level of reasoning that approaches human-like understanding. Users can interact with Omnigen in a similar way to ChatGPT to generate and modify images, without needing to deal with segmentation, masking, or other complex techniques.
This breakthrough opens up new possibilities for more natural interaction between human creators and AI tools. The researchers have also embedded Microsoft’s Phi-3 LLM into Omnigen and trained it to apply a chain-of-thought approach to image generation, breaking down complex creative tasks into smaller, more manageable steps.
This methodical process allows for unprecedented control over the creative workflow, although output quality currently matches rather than exceeds standard generation methods. Looking ahead, researchers are exploring ways to enhance Omnigen’s capabilities, with future iterations potentially including improved handling of text-heavy images and more sophisticated reasoning abilities.
Omnigen is open source and can be run locally, although users have a few free generations thanks to Hugging Face, which provides server access in case users don’t have the required hardware. The model can handle up to three images of context and a nice amount of text input, with a very detailed set of instructions to generate or edit images.
While Omnigen may not currently outperform other image generation models like Flux or SD 3.5 in terms of quality, its strength lies in accuracy and prompt adherence, making it a powerful and user-friendly option for AI image editing. It could be particularly useful for beginners testing the waters of open-source AI, as well as professional AI artists looking to simplify their workflows by combining its capabilities with their own.
Source
<p>The post Introducing Omnigen: The AI-Powered Image Editing Tool first appeared on CoinBuzzFeed.</p>