This AI Paper Unveils HiFi4G: A Breakthrough in Photo-Real Human Modeling and Efficient Rendering

On Dec 12, 2023

Volumetric recording and realistic representation of 4D (spacetime) human performance dissolve the barriers between spectators and performers. It offers a variety of immersive VR/AR experiences, such as telepresence and tele-education. Some early systems use nonrigid registration explicitly to recreate textured models from recorded footage. However, they are still susceptible to occlusions and texture deficiencies, which lead to gaps and noise in the reconstruction output. Recent neural breakthroughs, exemplified by NeRF, optimize a coordinate-based multi-layer perceptron (MLP) rather than resorting to explicit reconstruction to achieve photo-realistic volume rendering.

Certain dynamic NeRF variations aim to preserve a canonical feature space for reproducing features in every live frame using an additional implicit deformation field. However, such a canonical design is sensitive to significant topological changes or massive movements. Through planar factorization or hash encoding, recent methods eliminate the deformation fields and compactly describe the 4D feature grid. They greatly speed up interactive program rendering and training but leave open the issues of runtime memory and storage. Recently, 3D Gaussian Splatting (3DGS) returns to an explicit paradigm for representing static scenes. It enables previously unattainable real-time and high-quality radiance field rendering based on GPU-friendly rasterization of 3D Gaussian primitives. Several ongoing projects modify 3DGS to fit dynamic settings.

Some concentrate on taking the dynamic Gaussians’ nonrigid movements and losing the rendering quality in the process. Others lose the explicit and GPU-friendly elegance of the original 3DGS and cannot handle long-duration movements because they use additional implicit deformation fields to make up for the motion information. In this study, the research team from ShanghaiTech University, NeuDim, ByteDance, and DGene introduce HiFi4G, a fully explicit and compact Gaussian-based method for recreating high-fidelity 4D human performances from dense video (refer to Fig. 1). Their main concept is to combine nonrigid tracking with the 3D Gaussian representation to separate motion and appearance data for a representation that is both compact and compression-friendly. HiFi4G performs noticeably better regarding the current implicit rendering techniques’ optimization speed, rendering quality, and storage overhead.

**Figure 1** shows our compact Gaussian Splatting rendered in high resolution. HiFi4G combines the classic non-rigid fusion technique with differentiable rasterization advancement from multi-view human performance video to effectively generate compact 4D assets.

With the help of their explicit representation, their results can also be effortlessly integrated into the GPU-based rasterization pipeline, allowing users to witness high-fidelity human performances in virtual reality while wearing VR headsets. The research team first offer a dual-graph technique comprising a fine-grained Gaussian and coarse deformation graph to connect Gaussian representation naturally with nonrigid tracking. For the former, the research team use the NeuS2 to create a per-frame geometry proxy before using embedded deformation (ED) in a key-frame fashion. Such an explicit tracking technique divides the sequence into parts, giving rich motion previous inside each segment. Similar to the key-volume update, the research team limit the number of Gaussians in the current segment by using 3DGS to prune the wrong Gaussians from the previous segment and update new ones.

Next, the research team constructed a fine-grained Gaussian graph for further initialization by interpolating each Gaussian motion from the coarse ED network. Severe unnatural distortions result from naïvely bending the Gaussian graph with the ED graph and slapping it into screen space; jittery artifacts arise from continuous optimization without any limitations. To properly balance the updating of Gaussian characteristics and the nonrigid motion prior, the research team therefore suggests a 4D Gaussian optimization approach. The research team employ a temporal regularizer to ensure consistency in the appearance properties of each Gaussian, such as opacity, scaling coefficients, and spherical harmonic (SH). The research team suggests a smooth term for the motion characteristics (position and rotation) to generate locally as-rigid as-possible movements between the neighboring Gaussians.

An adaptive weighting mechanism is added to these regularizers to punish the flicking artifacts on the regions that exhibit small, nonrigid movements. The research team generates spatially-temporally compact 4D Gaussians after optimization. The research team presents a companion compression technique that adheres to conventional residual correction, quantization, and entropy encoding for the Gaussian parameters to make their HiFi4G useful for consumers. With a significant compression rate of about 25 times and less than 2 MB of storage needed for each frame, it allows for the immersive observation of human performances on various devices, including VR headsets.

In brief, their primary contributions include the following:

• The research team introduced a compact 4D Gaussian representation that connects Gaussian Splatting with nonrigid tracking for human performance rendering.

• The research team provides a dual-graph approach that may efficiently recover spatially-temporally consistent 4D Gaussians using different regularization designs.

• The research team provides a complementary compression approach that enables a low-storage immersive human performance experience across several platforms.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 [Free Webinar] LLMs in Banking: Building Predictive Analytics for Loan Approvals (Dec 13 2023)

Credit: Source link