Converting 2D images into 3D objects for the purpose of text-to-3D generation is a daunting task. This is mainly because the 2D diffusion models learn only the view-agnostic priors and do not have an understanding of the 3D space during lifting. An outcome of this limitation is the multi-view inconsistency problem, i.e., the 3D object is not consistent from all viewpoints. For example, if we lift a 2D image of a cube into 3D space, the model might generate a cube that is perfect from one perspective but distorted from others.
To address this issue of geometric inconsistency, a group of researchers has introduced a new method called SweetDreamer, which adds well-defined 3D shapes during the lifting and then aligns the 2D geometric priors in diffusion models with the same. The model achieves this by fine-tuning the 2D diffusion model to be viewpoint-aware (to understand how the object’s appearance changes depending on the viewpoint) and produce view-specific coordinate maps of canonically oriented 3D objects. This approach is very effective at producing 3D objects that are consistent from all viewpoints.
The researchers have realized that the main reason behind 3D inconsistent results is due to geometric inconsistency, and therefore, their goal is to equip 2D priors with the ability to generate 3D objects that look the same from all viewpoints while retaining their generalizability.
The method proposed by the researchers leverages a comprehensive 3D dataset comprising diverse canonically oriented and normalized 3D models. Depth maps are rendered from random angles and converted into canonical coordinates maps. Then, they fine-tune the 2D diffusion model to produce the coordinate map aligned with a specific view, eventually aligning the geometric priors in 2D diffusion. Finally, the aligned geometric priors can be smoothly integrated into various text-to-3D systems, effectively reducing inconsistency issues and producing diverse, high-quality 3D content.
DMTet and NeRF are two common 3D representations used in text-to-3D generation. In the research paper, the authors showed that their aligned geometric priors can be integrated into both DMTet-based and NeRF-based text-to-3D pipelines to improve the quality of the generated 3D objects. This demonstrates the generality of their approach and its potential to enhance the performance of a wide range of text-to-3D systems.
Due to the lack of well-established metrics to evaluate the results of text-to-3D processes, the researchers focused on evaluating the multi-view consistency of the 3D results. They randomly selected 80 prompts from the DreamFusion gallery and performed text-to-3D generation using each method. 3D inconsistencies were then manually checked to report the success rate. The researchers found that their method significantly outperforms other methods. Their success rates were above 85% in both pipelines (DMTet and NeRF), while the other methods scored around 30%.
In conclusion, the SweetDreamers method presents a novel way of achieving state-of-the-art performance in text-to-3D generation. It can generate results from a wide array of prompts that are free from the issue of multi-view inconsistencies. It gives a better performance compared to other previous methods, and the researchers believe that their work would open up a new direction of using limited 3D data to enhance 2D diffusion priors for text-to-3D generation.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
We are also on WhatsApp. Join our AI Channel on Whatsapp..
I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.
Credit: Source link
Comments are closed.