Meet DragonDiffusion: A Fine-Grained Image Editing Method Enabling Drag-style Manipulation on Diffusion Models

On Jul 8, 2023

Big-scale text-to-image (T2I) diffusion models, which aim to generate images conditioned on a given text/prompt, have seen rapid development thanks to the availability of big amounts of training data and massive computer capacity. Nonetheless, this generative capacity is often varied, making it difficult to develop appropriate prompts to generate images compatible with what the user has in mind and further modification based on existing images.

Image editing has more varied requirements than image creation. Since the latent space is small and easily manipulated, GAN-based methods have found widespread application in picture editing. Diffusion models are more stable and generate better quality output than GAN models.

A new research paper by Peking University and ARC Lab, Tencent PCG, aims to determine if the diffusion model may have the same drag-like capabilities.

[Sponsored] 🔥 Build your personal brand with Taplio 🚀 The 1st all-in-one AI-powered tool to grow on LinkedIn. Create better LinkedIn content 10x faster, schedule, analyze your stats & engage. Try it for free!

The fundamental difficulty in implementing this requires a compact and editable latent space. Many diffusion-based image editing approaches have been developed based on the similarity between these intermediate text and image properties. Studies discover a strong local resemblance between word and object features in the cross-attention map, which can be used in editing.

While there is a robust correlation between text characteristics and intermediate picture features in the large-scale T2I diffusion generation process, there is also a robust correspondence between intermediate image features. This feature has been investigated in DIFT, proving that the correspondence between these features is at a high degree and enabling the direct comparison of similar regions across images. Because of this high similarity between image elements, the team employs this method to accomplish image modification.

To adapt the diffusion model’s intermediate representation, the researchers devise a classifier guidance-based strategy called DragonDiffusion that converts the editing signals into gradients by feature correspondence loss. The proposed approach to diffusion uses two groups of features (i.e., guidance features and generation features) at different stages. With robust image feature correspondence as their guide, they revise and refine the generating features based on the guidance features. Strong image feature correspondence also helps to preserve content consistency between the altered image and the original.

In this context, the researchers also find out that another work called Drag-Diffusion investigates the same topic simultaneously. It uses LORA to keep things looking like they did initially, and it improves the editing process by optimizing a single intermediate step in the diffusion procedure. Instead of fine-tuning or training the model, as with DragDiffusion, the method proposed in this work is based on classifier guidance, with all editing and content consistency signals coming directly from the image.

DragonDiffusion derives all content modification and preservation signals from the original image. Without additional model tweaking or training, the capability of T2I creation in diffusion models can be directly transferred to picture editing applications.

Extensive trials show that the proposed DragonDiffusion can produce a wide range of fine-grained image-altering tasks, such as resizing and repositioning objects, changing their look, and dragging their contents.

Check out the Paper and Github Link. Don’t forget to join our 25k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.

🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Check it out here. (Sponsored)

Credit: Source link