Google DeepMind Researchers Unveil Multistep Consistency Models: A Machine Learning Approach that Balances Speed and Quality in AI Sampling

On Mar 14, 2024

Diffusion models have gained prominence in image, video, and audio generation, but their sampling process is computationally expensive compared to training. Consistency Models offer faster sampling but sacrifice image quality, with Consistency Training (CT) and Consistency Distillation (CD) being the variants. TRACT focuses on distillation, dividing the diffusion trajectory into stages to enhance performance. However, neither Consistency Models nor TRACT achieve performance comparable to standard diffusion models.

Prior work includes Consistency Models and TRACT. The former operates on multiple stages, simplifying modeling tasks and improving performance, while the latter focuses on distillation, progressively reducing stages to one or two for sampling. DDIM showed deterministic samplers degrade more gracefully than stochastic ones with limited sampling steps. Other approaches include second-order Heun samplers, different SDE integrators, specialized architectures, and Progressive Distillation to reduce model evaluations and sampling steps.

The researchers from Google Deepmind have proposed a machine learning method that unifies Consistency Models and TRACT to narrow the performance gap between standard diffusion models and low-step variants. It relaxes the single-step constraint, allowing 4, 8, or 16 function evaluations. Generalizations include adapting step schedule annealing and synchronized dropout from consistency modeling. Multistep Consistency Models split the diffusion process into segments, improving performance with fewer steps. A deterministic sampler called Adjusted DDIM (aDDIM) corrects integration errors for sharper samples.

Multistep Consistency Models divide the diffusion process into equal segments to simplify modeling. It utilizes a consistency loss to approximate path integrals by minimizing pairwise discrepancies. The algorithm involves training on this loss in z-space but re-parametrizing it in x-space for interpretability. With a focus on v-loss, it aims to prevent collapse to degenerate solutions, converging to diffusion models with increasing steps. The approach hypothesizes quicker convergence through fine-tuning and offers a trade-off between sample quality and duration as steps increase.

The experiments demonstrate that MultiStep Consistency Models achieve state-of-the-art FID scores on ImageNet64, surpassing Progressive Distillation (PD) on various step counts. Also, on ImageNet128, MultiStep Consistency Models outperform PD. Qualitatively, comparisons reveal minor differences in sample details between MultiStep Consistency Models and standard diffusion models in text-to-image tasks. These results highlight the efficacy of MultiStep Consistency Models in improving sample quality and efficiency compared to existing methods.

In conclusion, the researchers introduce multistep consistency models, merging them with consistency models and TRACT to narrow the performance gap between standard diffusion and few-step sampling. It offers a direct trade-off between sample quality and speed, reaching performance superior to standard diffusion in just eight steps. This unification significantly improves sample quality and efficiency in generative modeling tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit

Asjad is an intern consultant at Marktechpost. He is persuing B.Tech in mechanical engineering at the Indian Institute of Technology, Kharagpur. Asjad is a Machine learning and deep learning enthusiast who is always researching the applications of machine learning in healthcare.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Credit: Source link