Table of Contents
Fetching ...

Beyond Positional Encoding: A 5D Spatio-Directional Hash Encoding

Philippe Weier, Lukas Bode, Philipp Slusallek, Adrián Jarabo, Sébastien Speierer

TL;DR

This work proposes a new spatio-directional neural encoding that is compact and efficient, and supports all-frequency signals in both space and direction, and applies this five-dimensional encoding in the context of neural path guiding, outperforming the state of the art by up to a factor of 2 in terms of variance reduction.

Abstract

In this work, we propose a new spatio-directional neural encoding that is compact and efficient, and supports all-frequency signals in both space and direction. Current learnable encodings focus on Cartesian orthonormal spaces, which have been shown to be useful for representing high-frequency signals in the spatial domain. However, directly applying these encodings in the directional domain results in distortions, singularities, and discontinuities. As a result, most related works have used more traditional encodings for the directional domain, which lack the expressivity of learnable neural encodings. We address this by proposing a new angular encoding that generalizes the hash-grid approach from proach from Müller et al. [2022] to the directional domain by encoding directions using a hierarchical geodesic grid. Each vertex in the geodesic grid stores a learnable latent parameter, which is used to feed a neural network. Armed with this directional encoding, we propose a five-dimensional encoding for spatio-directional signals. We demonstrate that both encodings significantly outperform other hash-based alternatives. We apply our five-dimensional encoding in the context of neural path guiding, outperforming the state of the art by up to a factor of 2 in terms of variance reduction for the same number of samples.

Beyond Positional Encoding: A 5D Spatio-Directional Hash Encoding

TL;DR

This work proposes a new spatio-directional neural encoding that is compact and efficient, and supports all-frequency signals in both space and direction, and applies this five-dimensional encoding in the context of neural path guiding, outperforming the state of the art by up to a factor of 2 in terms of variance reduction.

Abstract

In this work, we propose a new spatio-directional neural encoding that is compact and efficient, and supports all-frequency signals in both space and direction. Current learnable encodings focus on Cartesian orthonormal spaces, which have been shown to be useful for representing high-frequency signals in the spatial domain. However, directly applying these encodings in the directional domain results in distortions, singularities, and discontinuities. As a result, most related works have used more traditional encodings for the directional domain, which lack the expressivity of learnable neural encodings. We address this by proposing a new angular encoding that generalizes the hash-grid approach from proach from Müller et al. [2022] to the directional domain by encoding directions using a hierarchical geodesic grid. Each vertex in the geodesic grid stores a learnable latent parameter, which is used to feed a neural network. Armed with this directional encoding, we propose a five-dimensional encoding for spatio-directional signals. We demonstrate that both encodings significantly outperform other hash-based alternatives. We apply our five-dimensional encoding in the context of neural path guiding, outperforming the state of the art by up to a factor of 2 in terms of variance reduction for the same number of samples.
Paper Structure (19 sections, 6 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 6 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: The hash-sphere encoding. For an input direction $\mathbf{d}$: (1) We traverse a hierarchy of $L$ recursively-subdivided icosahedral grids defined in the unit sphere, identifying the enclosing triangle at each level. (2) At each level $l$, we retrieve learnable features $\theta_l[ \Phi_l(\mathbf{v}) ]$ from the triangle's three vertices (using direct indexing for coarse levels, and hashing for fine levels) and interpolate them using barycentric coordinates $\boldsymbol{\beta}_l$. (3) Features from all levels are concatenated into $\mathbf{f}(\mathbf{d})$. (4) A small MLP maps the concatenated feature vector $\mathbf{f}(\mathbf{d})$ to the final output.
  • Figure 2: Quality vs. memory trade-off representing an HDR environment map. We compare our hash-sphere against 2D and 3D hash-grid variants. The 2D hash-grid (polar parameterization) achieves comparable quality at mid-latitudes but suffers from severe distortions near the poles (inset). The 3D hash-grid (Cartesian) avoids polar artifacts but introduces interpolation-related artifacts due to working on a sub-optimal space for directional signals. Our hash-sphere provides consistent angular resolution across the sphere with an intuitive relationship between subdivision levels and frequency content. Memory includes both encoding parameters and MLP weights. Additional analyses with other environment maps can be found in the supplemental material.
  • Figure 3: The hash-grid-sphere encoding. For an input position-direction pair $(\mathbf{x}, \mathbf{d})$: At each level $l$, we locate both the enclosing spatial voxel (with 8 corners $C_{\mathbf{x},l}$) and the enclosing triangle in the sphere (with 3 vertices $V_{\mathbf{d},l}$). We retrieve learnable parameters for all $8 \times 3 = 24$ corner-vertex pairs (12 shown here), using either direct indexing or the joint hash function $h_{\text{joint}}$ depending on the grid size. Parameters are interpolated using the product of trilinear weights $w_\mathbf{c}$ and barycentric coordinates $\beta_\mathbf{v}$. Features from all levels are then concatenated and, typically, passed to an MLP. Compared to the hash-sphere (\ref{['fig:directional_encoding']}), the key addition is the coupling of spatial and directional grids through joint indexing, enabling compact representation of 5D spatio-directional signals.
  • Figure 4: Error vs. memory for radiance field reconstruction on the Phone scene. We report reconstruction error for both training and novel views across three encodings. The 3D hash-grid + SH cannot capture high-frequency view dependence, producing blurred results. The 6D hash-grid overfits training views but fails on novel views due to ill-defined directional interpolation. Our hash-grid-sphere achieves low error on both training and novel views, demonstrating meaningful generalization. Corresponding images for the highlighted configurations are shown in \ref{['fig:5d_encoding_images']}.
  • Figure 5: Equal sample neural incident radiance caching comparison. Both approaches perform similarly for simple diffuse indirect lighting (bottom). However, for complex indirect lighting with glossy materials (bottom), Rath et al. produce splotchy artifacts, while our encoding robustly handles high-frequency view-dependent indirect illumination.
  • ...and 3 more figures