Table of Contents
Fetching ...

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Sharvaree Vadgama, Efstratios Gavves, Erik J Bekkers

TL;DR

The paper addresses the lack of geometric inductive biases in conditional neural fields by introducing Equivariant Neural Fields (ENFs) that ground latent conditioning in a geometry-informed latent point cloud. ENFs use a cross-attention mechanism conditioned on bi-invariant geometric attributes and enforce locality via a Gaussian window, yielding a steerability property that links transformations in the field to corresponding transformations in the latent. This grounding enables weight sharing over similar local patterns and supports downstream geometric reasoning, demonstrated across reconstruction, image/shape classification, segmentation, climate forecasting, and generative modeling, with code releases provided. The work advances continuous-field representations by reinserting locality and symmetry into CNFs, improving learning efficiency, generalization, and editing capabilities across diverse data modalities.

Abstract

Conditional Neural Fields (CNFs) are increasingly being leveraged as continuous signal representations, by associating each data-sample with a latent variable that conditions a shared backbone Neural Field (NeF) to reconstruct the sample. However, existing CNF architectures face limitations when using this latent downstream in tasks requiring fine-grained geometric reasoning, such as classification and segmentation. We posit that this results from lack of explicit modelling of geometric information (e.g., locality in the signal or the orientation of a feature) in the latent space of CNFs. As such, we propose Equivariant Neural Fields (ENFs), a novel CNF architecture which uses a geometry-informed cross-attention to condition the NeF on a geometric variable--a latent point cloud of features--that enables an equivariant decoding from latent to field. We show that this approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws: if the field transforms, the latent representation transforms accordingly--and vice versa. Crucially, this equivariance relation ensures that the latent is capable of (1) representing geometric patterns faithfully, allowing for geometric reasoning in latent space, and (2) weight-sharing over similar local patterns, allowing for efficient learning of datasets of fields. We validate these main properties in a range of tasks including classification, segmentation, forecasting, reconstruction and generative modelling, showing clear improvement over baselines with a geometry-free latent space. Code attached to submission https://github.com/Dafidofff/enf-jax. Code for a clean and minimal repo https://github.com/david-knigge/enf-min-jax.

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

TL;DR

The paper addresses the lack of geometric inductive biases in conditional neural fields by introducing Equivariant Neural Fields (ENFs) that ground latent conditioning in a geometry-informed latent point cloud. ENFs use a cross-attention mechanism conditioned on bi-invariant geometric attributes and enforce locality via a Gaussian window, yielding a steerability property that links transformations in the field to corresponding transformations in the latent. This grounding enables weight sharing over similar local patterns and supports downstream geometric reasoning, demonstrated across reconstruction, image/shape classification, segmentation, climate forecasting, and generative modeling, with code releases provided. The work advances continuous-field representations by reinserting locality and symmetry into CNFs, improving learning efficiency, generalization, and editing capabilities across diverse data modalities.

Abstract

Conditional Neural Fields (CNFs) are increasingly being leveraged as continuous signal representations, by associating each data-sample with a latent variable that conditions a shared backbone Neural Field (NeF) to reconstruct the sample. However, existing CNF architectures face limitations when using this latent downstream in tasks requiring fine-grained geometric reasoning, such as classification and segmentation. We posit that this results from lack of explicit modelling of geometric information (e.g., locality in the signal or the orientation of a feature) in the latent space of CNFs. As such, we propose Equivariant Neural Fields (ENFs), a novel CNF architecture which uses a geometry-informed cross-attention to condition the NeF on a geometric variable--a latent point cloud of features--that enables an equivariant decoding from latent to field. We show that this approach induces a steerability property by which both field and latent are grounded in geometry and amenable to transformation laws: if the field transforms, the latent representation transforms accordingly--and vice versa. Crucially, this equivariance relation ensures that the latent is capable of (1) representing geometric patterns faithfully, allowing for geometric reasoning in latent space, and (2) weight-sharing over similar local patterns, allowing for efficient learning of datasets of fields. We validate these main properties in a range of tasks including classification, segmentation, forecasting, reconstruction and generative modelling, showing clear improvement over baselines with a geometry-free latent space. Code attached to submission https://github.com/Dafidofff/enf-jax. Code for a clean and minimal repo https://github.com/david-knigge/enf-min-jax.
Paper Structure (54 sections, 1 theorem, 11 equations, 17 figures, 12 tables, 2 algorithms)

This paper contains 54 sections, 1 theorem, 11 equations, 17 figures, 12 tables, 2 algorithms.

Key Result

Lemma 1

A conditional neural field satisfies the steerability property iff it is bi-invariant, i.e., $\forall g \in G:\; f_\theta(g x; g z) = f_\theta(x; z).$

Figures (17)

  • Figure 1: Equivariant Neural Fields (ENFs) ground Neural Fields (NeFs) in geometry using a latent point cloud. A latent set $z$ consisting of tuples $(p_i, \mathbf{c}_i)$ of pose information $p_i$ and context$\mathbf{c}_i$ is optimized to reconstruct to the field $f(\cdot)$ as a function $f_\theta(\cdot; z)$ using gradient-descent. Due to their explicit positional grounding and locality, the latent retains important geometric features in the input field. The latent $z$ can then be used in downstream tasks, e.g. classification, segmentation, and geometric reasoning, where transformations in the field are mirrored in the latent representation through group actions $L_g[f] \sim g \cdot z$.
  • Figure 2: ENFs preserve transformations through their steerability property; if the field transforms with a group action $g$, the latents transform accordingly via the following group action on the pointcloud; $gz=\{gp_i,\mathbf{c}_i\}^N_{i=1}$.
  • Figure 3: Mean class and instance IoU ($\uparrow$) on ShapeNet.
  • Figure 4: A visual intuition for the proposed cross-attention between coordinate $x_m$ and latent $z=\{(p_i,\mathbf{c}_i)\}_{i=1}^N$. (a) Bi-invariant $\mathbf{a}_{m,i}$ is calculated between coordinate $x_m$ and pose $p_i$ as $p_i^{-1}x_m$. (b) The query and key functions $\mathbf{q}$ transforms $\mathbf{a}_{m,i}$ into a query $\mathbf{q}_{m,i}$, and key function $\mathbf{k}$ maps context vector $\mathbf{c}_i$ to key $\mathbf{k}_i$. Attention coefficients are calculated through a softmax over $\mathbf{q}_{m,i}\mathbf{k}_i$. The softmax is taken over the $N$ latents, yielding $N$ attention coefficients $\operatorname{att}_{m,i}$, one for each latent $z_i$. (c) A value $\mathbf{v}_{m,i}$ for each latent-coordinate pair is calculated as a function $v$ of $\mathbf{c}_i$ and $\mathbf{a}_i$ - and the resulting values are aggregated, weighted by their corresponding attention coefficients $\operatorname{att}_{m,i}$.
  • Figure 6: ERA5 reconstruction $T_{t}$-MSE$\downarrow$ and 1-hour forecasting $T_{t+1}$-MSE$\downarrow$. *MSE between ground truth observations at $T_t$ and $T_{t+1}$.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof