Tailoring the Fabric of Generative AI: FABRIC is an AI Approach That Personalizes Diffusion Models with Iterative Feedback
Generative AI is a term that we all are familiar with nowadays. They have advanced a lot in recent years and have become a key tool in multiple applications.
The star of the generative AI show is the diffusion models. They have emerged as a powerful class of generative models, revolutionizing image synthesis and related tasks. These models have shown remarkable performance in generating high-quality and diverse images. Unlike traditional generative models such as GANs and VAEs, diffusion models work by iteratively refining a noise source, allowing for stable and coherent image generation.
Diffusion models have gained significant traction due to their ability to generate high-fidelity images with enhanced stability and reduced mode collapse during training. This has led to their widespread adoption and application across diverse domains, including image synthesis, inpainting, and style transfer.
However, they are not perfect. Despite their impressive capabilities, one of the challenges with diffusion models lies in effectively steering the model towards specific desired outputs based on textual descriptions. It is usually annoying to precisely describe the preferences through text prompts, sometimes, they are just not enough, or the model insists on ignoring them. So, you usually need to refine the generated image to make it usable.
But you know what you wanted the model to draw. So, in theory, you are the best person to evaluate the quality of the generated image; how close it resembles your imagination. What if we could integrate this feedback into the image generation pipeline so the model could understand what we wanted to see? Time to meet with FABRIC.
FABRIC (Feedback via Attention-Based Reference Image Conditioning) is a novel approach to enable the integration of iterative feedback into the generative process of diffusion models.
FABRIC utilizes positive and negative feedback images gathered from previous generations or human input. This enables it to leverage reference image-conditioning to refine future results. This iterative workflow facilitates the fine-tuning of generated images based on user preferences, providing a more controllable and interactive text-to-image generation process.
FABRIC is inspired by ControlNet, which introduced the ability to generate new images similar to reference images. FABRIC leverages the self-attention module in the U-Net, allowing it to “pay attention” to other pixels in the image and inject additional information from a reference image. The keys and values for reference injection are computed by passing the noised reference image through the U-Net of Stable Diffusion. These keys and values are stored in the self-attention layers of the U-Net, allowing the denoising process to attend to the reference image and incorporate semantic information.
Moreover, FABRIC is extended to incorporate multi-round positive and negative feedback, where separate U-Net passes are performed for each liked and disliked image, and the attention scores are reweighted based on the feedback. The feedback process can be scheduled according to denoising steps, allowing for iterative refinement of the generated images.
Check out the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled “Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning.” His research interests include deep learning, computer vision, video encoding, and multimedia networking.
Credit: Source link
Comments are closed.