Table of Contents
Fetching ...

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

TL;DR

Alpha Invariance identifies a scale-induced ambiguity in neural radiance fields: when scene distances scale by a factor $k$, the local density should scale by $1/k$ to keep rendered colors invariant. The authors propose two practical remedies: parameterizing distance and volume densities in log space (via $\alpha = 1 - \exp(-\exp(x) d)$ and related forms) and a discretization-agnostic high-transmittance initialization to ensure transparent rays across scales. Through extensive experiments across Vanilla NeRF, DVGO, Plenoxels, TensoRF, and Nerfacto, they demonstrate that the proposed recipe yields robust view synthesis across a wide range of scene sizes and mitigates common optimization failures. The work provides a principled framework and actionable initialization guidelines for scale-robust NeRF density modeling, improving reliability in real-world applications.

Abstract

Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initialization strategy to guarantee high ray transmittance. We revisit a few popular radiance field models and find that these systems use various heuristics to deal with issues arising from scene scaling. We test their behaviors and show our recipe to be more robust.

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

TL;DR

Alpha Invariance identifies a scale-induced ambiguity in neural radiance fields: when scene distances scale by a factor , the local density should scale by to keep rendered colors invariant. The authors propose two practical remedies: parameterizing distance and volume densities in log space (via and related forms) and a discretization-agnostic high-transmittance initialization to ensure transparent rays across scales. Through extensive experiments across Vanilla NeRF, DVGO, Plenoxels, TensoRF, and Nerfacto, they demonstrate that the proposed recipe yields robust view synthesis across a wide range of scene sizes and mitigates common optimization failures. The work provides a principled framework and actionable initialization guidelines for scale-robust NeRF density modeling, improving reliability in real-world applications.

Abstract

Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initialization strategy to guarantee high ray transmittance. We revisit a few popular radiance field models and find that these systems use various heuristics to deal with issues arising from scene scaling. We test their behaviors and show our recipe to be more robust.
Paper Structure (22 sections, 8 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 22 sections, 8 equations, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: A discretized view of volume rendering. Top: a ray is cut into intervals, each with a density $\sigma_i \ge 0$ and interval length $d_i$. Bottom: illustration of the weight given to the 3rd interval, computed through alpha compositing. The rendered color is obtained by weighting all the interval colors with their $w_i$s. If we scale each $d_i$ by a constant $k$, scaling $\sigma_i$ by $\frac{1}{k}$ renders the identical color.
  • Figure 2: $\alpha$ as a function of the raw input $x$ and $d$. We focus on $\alpha$ as a function of $x$, with different activation functions $\sigma(x)$, for a set of fixed values of interval lengths $d$. The $\mathtt{exp}$ activation (right plot) leads to a smooth, sigmoid-like transition from low to high $\alpha$ values regardless of the interval length $d$. The function $\exp(-\exp(-x))$ is the CDF of Gumbel distribution, and it is a numerically stable recipe that we recommend over $\mathtt{trunc\_exp}$ because log density and log distance naturally cancel out each other before exponentiation.
  • Figure 3: Distribution of volume density $\sigma$ in the lego bulldozer scene, queried via a uniformly sampled dense grid of points from both the coarse (left columns) and fine (right columns) MLP networks of vanilla-NeRF for different $k$. Color represents the average $\sigma$ value in each percentile of the sorted $\sigma$ distribution. The fine MLPs produce larger $\sigma$ than the coarse MLPs, and as $k$ increases, the magnitude of $\sigma$ decreases for both networks.
  • Figure 4: Values of volume density $\sigma$ on the object surface, with models trained under different scene scalings $k$. On each ray, surface point is defined as the 50th percentile location of volume rendering CDF. We annotate a few prominent points, and also produce two $\sigma$ division images that show the overall ratio of the numerical range of $\sigma$ at different scaling factors. The blender scenes are trained with vanilla NeRF, and the Mip-NeRF 360 scenes are trained with Nerfacto. The ratio of $\sigma$ is empirically close to $1/k$.
  • Figure 5: CDF of the $\sigma$ distributions produced by Plenoxels on the T-Rex scene from the LLFF dataset with different learning rate schedules on the $\sigma$ voxels. The default high learning rate schedule [red] is needed to produce large $\sigma$ values and high PSNR. Lower learning rates lead to smaller $\sigma$ and ultimately worse performance.
  • ...and 5 more figures