MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

Mihir Mahajan; Florian Hofherr; Daniel Cremers

MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

Mihir Mahajan, Florian Hofherr, Daniel Cremers

TL;DR

The paper tackles the efficiency challenge of neural fields on 3D meshes by introducing MeshFeat, a mesh-native multi-resolution feature encoding that uses mesh simplification to create multiple vertex-resolved feature grids and accumulates them on the finest resolution for decoding with a small MLP. This decouples spatial information from the neural decoder, enabling significantly faster inference while maintaining fidelity in texture reconstruction and BRDF estimation, and it naturally handles deforming meshes due to intrinsic vertex-based features. The approach achieves competitive reconstruction quality with substantial speedups, compared to state-of-the-art frequency-encoded methods, and demonstrates robust performance on deforming meshes and calibrated BRDF tasks. The work suggests further avenues, such as developing more texture-adaptive multi-resolution strategies and extending intrinsic mesh encodings for broader signal representations on dynamic geometry.

Abstract

Parametric feature grid encodings have gained significant attention as an encoding approach for neural fields since they allow for much smaller MLPs, which significantly decreases the inference time of the models. In this work, we propose MeshFeat, a parametric feature encoding tailored to meshes, for which we adapt the idea of multi-resolution feature grids from Euclidean space. We start from the structure provided by the given vertex topology and use a mesh simplification algorithm to construct a multi-resolution feature representation directly on the mesh. The approach allows the usage of small MLPs for neural fields on meshes, and we show a significant speed-up compared to previous representations while maintaining comparable reconstruction quality for texture reconstruction and BRDF representation. Given its intrinsic coupling to the vertices, the method is particularly well-suited for representations on deforming meshes, making it a good fit for object animation.

MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

TL;DR

Abstract

Paper Structure (38 sections, 9 equations, 14 figures, 7 tables)

This paper contains 38 sections, 9 equations, 14 figures, 7 tables.

Introduction
Related Work
Neural Fields
Input Encodings
Input Encodings for Neural Fields on Meshes
Multi-Resolution Approaches on Meshes
Method
Multi-Resolution Feature Encoding
Mesh Simplification
Multi-Resolution Strategy
Feature Interpolation
Feature Regularization
Model Architecture and Training Details
Experiments
Texture Reconstruction from Multi-View Images
...and 23 more sections

Figures (14)

Figure 1: We present MeshFeat, a parametric encoding strategy for neural fields on meshes. We propose a multi-resolution strategy based on mesh simplification to enhance the efficiency of the encoding. Our approach allows for much smaller MLPs than previous frequency-based encodings, resulting in significantly faster inference times. We evaluate the method for texture reconstruction and BRDF estimation and demonstrate, that the encoding is well suited to represent signals on deforming meshes.
Figure 2: Overview of our multi-resolution feature approach on the mesh. To get the feature encoding $\phi(x)$ for a point $x$ on the original mesh, we determine the vertices $u, v, w$ of the respective triangle. Using the mappings $m^{(i)}$, we gather the corresponding features from the different resolutions. By summing them, we obtain the features $\phi_u, \phi_v, \phi_w$ at the vertices in the original mesh. We receive the final feature encoding $\phi(x)$ by barycentric interpolation of the features at the vertices.
Figure 3: Qualitative results for texture reconstruction from multi-view images on the cat. Our method enables high-quality reconstructions, matching state-of-the-art methods in visual fidelity while offering a significant speedup. Because the baseline methods are based on frequency encodings, they lead to an over-smoothening of intricate details around the eye, which only our method can capture. Furthermore, NeuTex is unable to capture spatially fast changing color like on the mouth of the cat and shows distortions inside the ear.
Figure 4: Validation PSNR over training time in epochs. Our multi-resolution feature encoding leads to higher reconstruction quality despite having fewer parameters than a single-resolution encoding. For the latter, we use the finest resolution and arrange parameters the same way as in the multi-resolution approach. While a higher feature dimension $d$ leads to faster convergence in the single-resolution setting, it does not improve the reconstruction quality despite the increased number of parameters.
Figure 5: Qualitative results of our method for texture reconstruction with and without the regularization based on the mesh Laplacian. The results on the left, without the regularization, show visual artifacts around the ear, and on the shoe and arm. This is a direct result of sparse training data resulting in untrained feature vectors. Our regularization acts as a smoothing term that enables feature information to be diffused to the unsupervised areas, significantly reducing the noise.
...and 9 more figures

MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

TL;DR

Abstract

MeshFeat: Multi-Resolution Features for Neural Fields on Meshes

Authors

TL;DR

Abstract

Table of Contents

Figures (14)