NeRF: The Challenge of Editing the Content of Neural Radiance Fields

Earlier this year NVIDIA advanced Neural Radiance Fields (NeRF) research notably with InstantNeRF, apparently capable of generating explorable neural scenes in mere seconds – from a technique that, when it emerged in 2020, frequently took hours or even days to train.

NVIDIA's InstantNeRF provides impressive and rapid results. Source: https://www.youtube.com/watch?v=DJ2hcC1orc4

NVIDIA’s InstantNeRF provides impressive and rapid results. Source: https://www.youtube.com/watch?v=DJ2hcC1orc4

Though this kind of interpolation produces a static scene, NeRF is also capable of depicting movement, and of basic ‘copy-and-paste’ editing, where individual NeRFs can either be collated into composite scenes or inserted into existing scenes.

Nested NeRFs, featured in 2021 research from Shanghai Tech University and DGene Digital Technology. Source: https://www.youtube.com/watch?v=Wp4HfOwFGP4

Nested NeRFs, featured in 2021 research from Shanghai Tech University and DGene Digital Technology. Source: https://www.youtube.com/watch?v=Wp4HfOwFGP4

However, if you’re looking to intervene in a calculated NeRF and actually change something that’s going on inside it (in the same way you can change elements in a traditional CGI scene), the rapid pace of sector interest has come up with very few solutions to date, and none that even begin to match the capabilities of CGI workflows.

Though geometry estimation is essential to creating a NeRF scene, the final result is composed of fairly ‘locked’ values. While there is some progress being made towards changing texture values in NeRF, the actual objects in a NeRF scene are not parametric meshes that can be edited and played about with, but more akin to brittle and frozen point clouds.

In this scenario, a rendered person in a NeRF is essentially a statue (or a series of statues, in video NeRFs); the shadows they cast on themselves and other objects are textures, rather than flexible calculations based on light sources; and the editability of NeRF content is limited to the choices made by the photographer who takes the sparse source photos from which the NeRF is generated. Parameters such as shadows and pose remain non-editable, in any creative sense.

NeRF-Editing

A new academic research collaboration between China and the UK addresses this challenge with NeRF-Editing, where proxy CGI-style meshes are extracted from a NeRF, deformed at will by the user, and the deformations passed back through to the NeRF’s neural calculations:

NeRF puppetry with NeRF-editing, as the deformations calculated from footage are applied to equivalent points inside a NeRF representation. Source: http://geometrylearning.com/NeRFEditing/

NeRF puppetry with NeRF-editing, as the deformations calculated from footage are applied to equivalent points inside a NeRF representation. Source: http://geometrylearning.com/NeRFEditing/

The method adapts the NeuS 2021 US/China reconstructive technique, which extracts a Signed Distance Function (SDF, a much older method of volumetric reconstruction) that’s able to learn the geometry represented inside the NeRF.

This SDF object becomes the user’s sculpting base, with warping and molding capabilities provided by the venerable As-Rigid-As-Possible (ARAP) technique.

ARAP allows users to deform the extracted SDF mesh, though other methods, such as skeleton-based and cage-based approaches (i.e. NURBs), would also work well. Source: https://arxiv.org/pdf/2205.04978.pdf

ARAP allows users to deform the extracted SDF mesh, though other methods, such as skeleton-based and cage-based approaches (i.e. NURBs), would also work well. Source: https://arxiv.org/pdf/2205.04978.pdf

With the deformations applied, it’s necessary to translate this information from vector to the RGB/pixel level native to NeRF, which is a slightly longer journey.

The triangular vertices of the mesh that the user has deformed are first translated into a tetrahedral mesh, which forms a skin around the user-mesh. A spatial discrete deformation field is extracted from this additional mesh, and finally a NeRF-friendly continuous deformation field is obtained which can be passed back into the neural radiance environment, reflecting the user’s changes and edits, and directly affecting the interpreted rays in the target NeRF.

Objects deformed and animated by the new method.

Objects deformed and animated by the new method.

The paper states:

‘After transferring the surface deformation to the tetrahedral mesh, we can obtain the discrete deformation field of the “effective space”. We now utilize these discrete transformations to bend the casting rays. To generate an image of the deformed radiance field, we cast rays to the space containing the deformed tetrahedral mesh.’

The paper is titled NeRF-Editing: Geometry Editing of Neural Radiance Fields, and comes from researchers across three Chinese universities and institutions, together with a researcher from the School of Computer Science & Informatics at Cardiff University, and another two researchers from the Alibaba Group.

Limitations

As mentioned earlier, transformed geometry will not ‘update’ any related aspects in the NeRF that have not been edited, nor reflect secondary consequences of the deformed element, such as shadows. The researchers provide an example, where under-shadows on a human figure in a NeRF remain unaltered, even though the deformation should alter the lighting:

From the paper: we see that the horizontal shadow on the figure's arm remains in place even as the arm is moved upward.

From the paper: we see that the horizontal shadow on the figure’s arm remains in place even as the arm is moved upward.

Experiments

The authors observe that there are currently no comparable methods for direct intervention into NeRF geometry. Therefore the experiments conducted for the research were more exploratory than comparative.

The researchers demonstrated NeRF-Editing on a number of public datasets, including characters from Mixamo, and the now-iconic Lego bulldozer and chair from the original NeRF implementation. They also experimented on a real captured horse statue from the FVS dataset, as well as their own original captures.

A horse's head tilted.

A horse’s head tilted.

For future work, the authors intend to develop their system in the just-in-time (JIT) compiled machine learning framework Jittor.

 

First published 16th May 2022.

Credit: Source link

Comments are closed.