Table of Contents
Fetching ...

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park

TL;DR

CodecNeRF addresses the challenge of turning NeRF into ubiquitously transmittable 3D media by coupling a forward-pass encoder–decoder with test-time finetuning. It introduces 3D feature construction from multi-view images, vector-quantized 3D feature compression into multi-resolution triplanes, and a two-headed MLP renderer, augmented with parameter-efficient fine-tuning (PEFT) and entropy coding of deltas. Empirical results on Objaverse, Google Scanned Objects, and DTU show up to 100x compression with faster encoding and comparable or better image quality than strong baselines. This work enables practical 3D content delivery over networks and opens avenues for further compression and 3D codec research.

Abstract

Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, to establish a ubiquitous presence in everyday media formats, such as images and videos, we need to fulfill three key objectives: 1. fast encoding and decoding time, 2. compact model sizes, and 3. high-quality renderings. Despite recent advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of an encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 100x and remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets.

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

TL;DR

CodecNeRF addresses the challenge of turning NeRF into ubiquitously transmittable 3D media by coupling a forward-pass encoder–decoder with test-time finetuning. It introduces 3D feature construction from multi-view images, vector-quantized 3D feature compression into multi-resolution triplanes, and a two-headed MLP renderer, augmented with parameter-efficient fine-tuning (PEFT) and entropy coding of deltas. Empirical results on Objaverse, Google Scanned Objects, and DTU show up to 100x compression with faster encoding and comparable or better image quality than strong baselines. This work enables practical 3D content delivery over networks and opens avenues for further compression and 3D codec research.

Abstract

Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, to establish a ubiquitous presence in everyday media formats, such as images and videos, we need to fulfill three key objectives: 1. fast encoding and decoding time, 2. compact model sizes, and 3. high-quality renderings. Despite recent advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of an encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 100x and remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets.
Paper Structure (33 sections, 10 equations, 25 figures, 11 tables)

This paper contains 33 sections, 10 equations, 25 figures, 11 tables.

Figures (25)

  • Figure 1: CodecNeRF encoder and decoder architecture.
  • Figure 2: Parameter-efficient finetuning process.
  • Figure 3: Novel view synthesis results on Objaverse dataset.
  • Figure 4: Novel view synthesis results on GSO dataset. 'Triplanes' and 'Ours (PEFT)' is finetuned with 1k iterations.
  • Figure 5: Novel view synthesis results on DTU dataset.
  • ...and 20 more figures