Affine-Equivariant Kernel Space Encoding for NeRF Editing
Mikołaj Zieliński, Krzysztof Byrski, Tomasz Szczepanik, Dominik Belter, Przemysław Spurek
TL;DR
Affine-Equivariant Kernel Space Encoding (EKS) redefines NeRF latent spaces by using a field of anisotropic Gaussian kernels to enable localized, deformation-aware editing while preserving rendering fidelity. Features are interpolated through Mahalanobis-distance-based weights over nearby Gaussians, and a Ray-Traced Gaussian Proximity Search ensures affine-consistent neighborhood queries. A training-time hash-grid feature distillation transfers detail into the kernel field, yielding a grid-free representation suitable for editing via Gaussian tetrahedra bound to meshes. Empirical results on NeRF-Synthetic, Mip-NeRF 360, and physics-based benchmarks show competitive reconstruction quality and superior editing robustness, including physics-driven scene manipulation, relative to prior editable NeRF methods.
Abstract
Neural scene representations achieve high-fidelity rendering by encoding 3D scenes as continuous functions, but their latent spaces are typically implicit and globally entangled, making localized editing and physically grounded manipulation difficult. While several works introduce explicit control structures or point-based latent representations to improve editability, these approaches often suffer from limited locality, sensitivity to deformations, or visual artifacts. In this paper, we introduce Affine-Equivariant Kernel Space Encoding (EKS), a spatial encoding for neural radiance fields that provides localized, deformation-aware feature representations. Instead of querying latent features directly at discrete points or grid vertices, our encoding aggregates features through a field of anisotropic Gaussian kernels, each defining a localized region of influence. This kernel-based formulation enables stable feature interpolation under spatial transformations while preserving continuity and high reconstruction quality. To preserve detail without sacrificing editability, we further propose a training-time feature distillation mechanism that transfers information from multi-resolution hash grid encodings into the kernel field, yielding a compact and fully grid-free representation at inference. This enables intuitive, localized scene editing directly via Gaussian kernels without retraining, while maintaining high-quality rendering. The code can be found under (https://github.com/MikolajZielinski/eks)
