Meet One-2-3-45++: An Innovative Artificial Intelligence Method that Transforms a Single Image into a Detailed 3D Textured Mesh in Approximately One Minute

On Nov 28, 2023

Researchers from UC San Diego, Zhejiang University, Tsinghua University, UCLA, and Stanford University have introduced One-2-3-45++, an innovative AI method for rapid and high-fidelity 3D object generation. The approach leverages 2D diffusion models, initially fine-tuning for consistent multi-view image generation. Multi-view conditioned 3D native diffusion models are then employed to transform these images into detailed 3D textured meshes. The technique synthesizes high-quality, diverse 3D assets closely resembling the input image within approximately one minute, addressing the challenges of speed and fidelity in practical applications.

One-2-3-45++ is a method for generating high-fidelity 3D objects from a single RGB image in under one minute. Leveraging multi-view images, the approach refines the texture of the generated mesh through a lightweight optimization process. Comparative evaluations demonstrate the superiority of One-2-3-45++ over baseline methods regarding CLIP similarity and user preference scores. The significance of multi-view images for the efficacy of the 3D diffusion module is emphasized, showcasing improvements over existing approaches in consistent multi-view generation.

The research addresses the challenge of generating 3D shapes from single images or text prompts, which is crucial for various applications. Existing methods need to be revised in generalizing across unseen categories due to the scarcity of 3D training data. The proposed One-2-3-45++ method overcomes the shortcomings of its predecessor, One-2-3-45, by simultaneously predicting consistent multi-view images and utilizing a multiview conditioned 3D diffusion-based module for efficient and realistic 3D reconstruction. The approach achieves high-quality results with fine-grained control in under a minute, outperforming baseline methods.

One-2-3-45++ model, trained on extensive multi-view and 3D pairings, employs separate diffusion networks for each stage. The first stage utilizes normal 3D convolution to create the full 3D occupancy volume, and the second stage incorporates 3D sparse convolution for the 3D light volume. A lightweight refinement module guided by multi-view images enhances texture quality. Evaluation metrics, including CLIP similarity and user preference scores, demonstrate the method’s superiority over baselines. A user study validates the quality, highlighting runtime efficiency compared to existing approaches.

One-2-3-45++ surpasses baseline methods in CLIP similarity and user preference scores, showcasing superior quality and performance. The refinement module enhances texture quality, leading to higher CLIP similarity scores. Additionally, the method offers notable runtime advantages compared to optimization-based methods, delivering prompt results.

In conclusion, One-2-3-45++ is a highly efficient technology that can produce high-quality 3D textured meshes from a single image quickly and accurately. A user study has validated its superiority over text-to-3D modeling methods regarding quality and alignment with input images. Additionally, it offers fast results, outperforming optimization-based alternatives.

Future research should focus on leveraging larger and more diverse 3D training datasets, exploring additional post-processing techniques, optimizing the texture refinement module, conducting extensive user studies, and integrating other information types. It is crucial to assess its effectiveness and potential impact when applying the method in diverse domains such as virtual reality, gaming, and computer-aided design.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Hello, My name is Adnan Hassan. I am a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a dual degree at the Indian Institute of Technology, Kharagpur. I am passionate about technology and want to create new products that make a difference.

↗ Step by Step Tutorial on ‘How to Build LLM Apps that can See Hear Speak’

Credit: Source link