2023-01-30 更新
GeCoNeRF: Few-shot Neural Radiance Fields via Geometric Consistency
Authors:Minseop Kwak, Jiuhn Song, Seungryong Kim
We present a novel framework to regularize Neural Radiance Field (NeRF) in a few-shot setting with a geometry-aware consistency regularization. The proposed approach leverages a rendered depth map at unobserved viewpoint to warp sparse input images to the unobserved viewpoint and impose them as pseudo ground truths to facilitate learning of NeRF. By encouraging such geometry-aware consistency at a feature-level instead of using pixel-level reconstruction loss, we regularize the NeRF at semantic and structural levels while allowing for modeling view dependent radiance to account for color variations across viewpoints. We also propose an effective method to filter out erroneous warped solutions, along with training strategies to stabilize training during optimization. We show that our model achieves competitive results compared to state-of-the-art few-shot NeRF models. Project page is available at https://ku-cvlab.github.io/GeCoNeRF/.
PDF
点此查看论文截图
HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN
Authors:Adam Kania, Artur Kasymov, Maciej Zięba, Przemysław Spurek
Recently, generative models for 3D objects are gaining much popularity in VR and augmented reality applications. Training such models using standard 3D representations, like voxels or point clouds, is challenging and requires complex tools for proper color rendering. In order to overcome this limitation, Neural Radiance Fields (NeRFs) offer a state-of-the-art quality in synthesizing novel views of complex 3D scenes from a small subset of 2D images. In the paper, we propose a generative model called HyperNeRFGAN, which uses hypernetworks paradigm to produce 3D objects represented by NeRF. Our GAN architecture leverages a hypernetwork paradigm to transfer gaussian noise into weights of NeRF model. The model is further used to render 2D novel views, and a classical 2D discriminator is utilized for training the entire GAN-based structure. Our architecture produces 2D images, but we use 3D-aware NeRF representation, which forces the model to produce correct 3D objects. The advantage of the model over existing approaches is that it produces a dedicated NeRF representation for the object without sharing some global parameters of the rendering component. We show the superiority of our approach compared to reference baselines on three challenging datasets from various domains.
PDF
点此查看论文截图
SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning
Authors:Dongseok Shim, Seungjae Lee, H. Jin Kim
As previous representations for reinforcement learning cannot effectively incorporate a human-intuitive understanding of the 3D environment, they usually suffer from sub-optimal performances. In this paper, we present Semantic-aware Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly optimizes semantic-aware neural radiance fields (NeRF) with a convolutional encoder to learn 3D-aware neural implicit representation from multi-view images. We introduce 3D semantic and distilled feature fields in parallel to the RGB radiance fields in NeRF to learn semantic and object-centric representation for reinforcement learning. SNeRL outperforms not only previous pixel-based representations but also recent 3D-aware representations both in model-free and model-based reinforcement learning.
PDF First two authors contributed equally. Order was determined by coin flip
点此查看论文截图
Text-To-4D Dynamic Scene Generation
Authors:Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. MAV3D does not require any 3D or 4D data and the T2V model is trained only on Text-Image pairs and unlabeled videos. We demonstrate the effectiveness of our approach using comprehensive quantitative and qualitative experiments and show an improvement over previously established internal baselines. To the best of our knowledge, our method is the first to generate 3D dynamic scenes given a text description.
PDF