An AI Research about Incorporating Interpolation between Images with the Help of Diffusion Models

Artificial Intelligence is the latest topic of discussion among developers and researchers. From Natural Language Processing and Natural Language Understanding to Computer Vision, AI is revolutionizing almost every domain. The recently introduced Large Language Models like DALL-E have been successful in generating beautiful images from textual prompts. Although there has been great advancement in image creation and manipulation, one area that still needs more research is the interpolation between two input images. Such interpolations cannot be done by the image-generating pipelines that are currently in use. 

Adding the interpolation feature in image-generating models can successfully result in new and innovative applications. Recently, a team of researchers from MIT CSAIL has released a research paper addressing the issue and suggesting a strategy that can produce high-quality interpolations across images from various domains and layouts using pre-trained latent diffusion models. They have shared how the inclusion of zero-shot interpolation using latent diffusion models can help. Their strategy entails working in the generative model’s latent space by applying interpolation between the corresponding latent representations of the two input images. 

The interpolation procedure occurs at various progressively lower levels of noise, where noise refers to a random perturbation that is applied to the latent vectors and impacts the appearance of the resulting image. The researchers have shared how they denoise the interpolated representations after completing the interpolation by minimizing the impact of additional noise, which would help in the improvement of the interpolated images. 

The interpolated text embeddings obtained through textual inversion are required for the denoising stage. The written descriptions are thereby converted into equivalent visual features with the help of textual inversion, which enables a model to comprehend the intended interpolation properties. Subject poses have been intentionally incorporated to help direct the interpolation procedure so that the model is able to produce more consistent and realistic interpolations that provide information about the positioning and orientation of objects or people in the photos.

This approach is capable of generating multiple candidate interpolations to assure high-quality outcomes and good flexibility. Using CLIP, a neural network that can comprehend the content of images and texts, these candidates can be contrasted, and the best interpolation based on particular requirements or user preferences can be chosen. In a number of settings, including subject poses, image styles, and image content, the team has shown that this method delivers believable interpolations. 

The team has shared that the conventional quantitative metrics like FID (Fréchet Inception Distance), which are commonly used to evaluate the quality of generated images, are insufficient for measuring the quality of interpolations because interpolations have unique characteristics and should be assessed differently from individual generated images. The introduced pipeline is useful and easily deployable as it gives the user great flexibility through text conditioning, noise scheduling, and the choice to manually choose from the created candidates. 

In conclusion, this study tackles a problem that has received little attention in the realm of picture editing. Latent diffusion models that have already been trained are used in this strategy, and the approach has been compared to other interpolation methods and qualitative outcomes to show how effective it is.


Check out the Paper, Github, and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.


🔥 Use SQL to predict the future (Sponsored)

Credit: Source link

Comments are closed.