Latest Computer Vision Research From Cornell and Adobe Proposes An Artificial Intelligence (AI) Method To Transfer The Artistic Features Of An Arbitrary Style Image To A 3D Scene

On Oct 25, 2022

Art is a fascinating yet extremely complex discipline. Indeed, the creation of artistic images is often not only a time-consuming problem but also requires a significant amount of expertise. If this problem holds for 2D artworks, imagine extending it to dimensions beyond the image plane, such as time (in animated content) or 3D space (with sculptures or virtual environments). This introduces new constraints and challenges, which are addressed by this paper.

Previous results involving 2D stylization focus on video contents split frame by frame. The result is that the generated individual frames achieve high-quality stylization but often lead to flickering artifacts in the generated video. This is due to the lack of temporal coherence of the produced frames. Furthermore, they do not investigate the 3D environment, which would increase the complexity of the task. Other works focusing on 3D stylization suffer from geometrically inaccurate reconstructions of point cloud or triangle meshes and the lack of style details. The reason lies in the different geometrical properties of starting mesh and produced mesh, as the style is applied after a linear transformation.

The proposed method termed Artistic Radiance Fields (ARF), can transfer the artistic features from a single 2D image to a real-world 3D scene, leading to artistic novel view renderings that are faithful to the input style image (Fig. 1).

Source: https://arxiv.org/pdf/2206.06360.pdf

For this purpose, the researchers exploited a photo-realistic radiance field reconstructed from multiple images of real-world scenes into a new, stylized radiance field that supports high-quality stylized renderings from a novel viewpoint. The results are shown in Fig. 1.

As an example, given in input a set of real-world pictures of an excavator and an image of the famous Van Gogh’s “Starry Night” painting as “style” to be applied to it, the result is a colorful excavator with a smooth texture resembling the painting.

The ARF pipeline is presented in the figure below (Fig. 2).

The key point of this architecture is the coupling of the proposed Nearest Neighbor Featuring Matching (NNFM) loss and the color transfer.

The NNFM involves the comparison between the feature maps of both rendered and style images, extracted using the notorious VGG-16 Convolutional Neural Network (CNN). This way, the features can be utilized to guide the transfer of complex high-frequency visual details consistently across multiple viewpoints.

Color transfer is instead a technique used to avoid a noticeable color mismatch between the synthesized views and the style image. It implicates a linear transformation of the pixels forming the input images to match the mean and covariance of the pixels in the style image.

In addition, the architecture employs a deferred back-propagation method, allowing for the computation of losses on full-resolution images with reduced load on the GPU. The first step is the image render at full resolution and the computation of image loss and gradient with respect to the pixel colors, which produces a cached gradient image. Then, these cache gradients are back-propagated patch-wise for the accumulation process.

The approach, ARF, presented in this paper brings several advantages. Firstly, it leads to stunning creations of stylized images almost without artifacts. Secondly, the stylized images can be produced from novel views with only a few input images, enabling artistic 3D reconstructions. Lastly, employing the deferred back-propagation method, the architecture significantly reduces the GPU memory footprint.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'ARF: Artistic Radiance Fields'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github link and project.

Please Don't Forget To Join Our ML Subreddit

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.

Credit: Source link