Table of Contents
Fetching ...

Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks

Sudarshan Devkota, Sumanta Pattanaik

TL;DR

The paper addresses the challenge of efficiently compressing and representing large volumetric datasets. It introduces a coordinate-based neural network framework that uses multi-resolution hash encoding to map spatial coordinates to voxel intensities, and adds Reptile-based meta-learning to initialize weights for faster convergence. Through extensive experiments against Neurcomp and Tthresh, the method shows improved PSNR and training efficiency, while enabling direct rendering from the compressed neural representation. The results highlight the potential of hash-encoded coordinate networks for scalable volume visualization, with caveats on data-dependent performance and artifacts at very high compression.

Abstract

In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a mapping between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multi-resolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications.

Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks

TL;DR

The paper addresses the challenge of efficiently compressing and representing large volumetric datasets. It introduces a coordinate-based neural network framework that uses multi-resolution hash encoding to map spatial coordinates to voxel intensities, and adds Reptile-based meta-learning to initialize weights for faster convergence. Through extensive experiments against Neurcomp and Tthresh, the method shows improved PSNR and training efficiency, while enabling direct rendering from the compressed neural representation. The results highlight the potential of hash-encoded coordinate networks for scalable volume visualization, with caveats on data-dependent performance and artifacts at very high compression.

Abstract

In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a mapping between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multi-resolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications.
Paper Structure (19 sections, 4 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 4 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of Multi-resolution hash encoding. In this two-dimensional example, we first break the image in $L$ resolution levels (grids). Figure shows an example with two levels, $L:0$ and $L:1$. For a given normalized input coordinate $x$, the integer coordinates of the surrounding corners are hashed to obtain an index to a hash table with size $T$. Every entry in the hash table is a trainable feature vector of size $W$. In the example shown above, we have $L=2$, $W=4$, and $T=8$. The feature vectors from the surrounding corners are linearly interpolated to obtain the feature vector at coordinate $x$. Then all the feature vectors for x from each level are concatenated with each other which forms the final encoded vector for input $x$.
  • Figure 2: A comparison of various encoding schemes for compressing the skull dataset is presented. Each configuration was trained for 50 epochs, where each epoch is one complete pass over the entire volume. The top left image represents the ground truth rendering (a). The reconstruction quality is assessed using metrics such as PSNR and SSIM, while additional information including time to compress (TC) and compression ratio (CR) is provided beneath each image. Notably, the reconstructed rendering using Hash encoding demonstrates comparable quality to Triangle wave (c) and One blob encoding (e), but exhibits faster compression time, by at least 2x
  • Figure 3: Figure shows compression error against the number of epochs for various combinations of $L$ and $W$ while keeping the total encoding parameters as constant. Notably, we find that configurations with $W$ values ranging from 4 to 8 consistently achieve higher PSNR values across all datasets.
  • Figure 4: Comparison of convergence speed between meta-learned initialization and random initialization for intra-domain weight transfer. The reconstruction PSNR is reported for the first 100 iterations (top row) and 2500 iterations (bottom row) for each dataset. The number of iterations corresponds to the number of gradient updates performed during the training process. The meta-learned approach exhibits faster convergence, particularly evident at the initial training phase. While the random initialization approach eventually achieves a similar PSNR to the meta-learned approach, the latter surpasses random initialization after only 100 iterations in terms of PSNR.
  • Figure 5: Comparison of convergence speed between meta-learned initialization and random initialization for inter-domain weight transfer. The reconstruction PSNR is reported for the first 100 iterations (top row) and 2500 iterations (bottom row) for each dataset. The number of iterations corresponds to the number of gradient updates performed during the training process. The meta-learned approach demonstrates slightly faster convergence for datasets a) engine, c) Csafe heptane, and d) Vorticity magnitude. However, it provides minimal to no advantage for dataset e) Boston teapot and performs poorly for dataset b) Tacc turbulence.
  • ...and 5 more figures