Table of Contents
Fetching ...

Mesh Denoising Transformer

Wenbo Zhao, Xianming Liu, Deming Zhai, Junjun Jiang, Xiangyang Ji

TL;DR

This work tackles mesh denoising by addressing two core challenges: loss of multi-attribute information from single-modal representations and limited global feature aggregation. It introduces Local Surface Descriptor (LSD), a multimodal representation that encodes local geometry as image-like patches and spatial context as a point cloud, enabling effective Transformer modeling. The SurfaceFormer framework employs a dual-stream Geometric Encoder and Spatial Encoder, followed by a Denoising Transformer to achieve global feature aggregation and robust denoising, with a Vertex Refinement step that aligns denoised vertices to normals. Experiments on synthetic, Kinect, real-scanned, and reconstructed datasets show state-of-the-art performance in both objective metrics $E_a$ and $E_v$ and subjective quality, demonstrating strong generalization and practical applicability for diverse scanning pipelines.

Abstract

Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted attributes of meshes, and a lack of effective global feature aggregation, hindering their ability to fully understand the mesh's comprehensive structure. To tackle these issues, we propose SurfaceFormer, a pioneering Transformer-based mesh denoising framework. Our first contribution is the development of a new representation known as Local Surface Descriptor, which is crafted by establishing polar systems on each mesh face, followed by sampling points from adjacent surfaces using geodesics. The normals of these points are organized into 2D patches, mimicking images to capture local geometric intricacies, whereas the poles and vertex coordinates are consolidated into a point cloud to embody spatial information. This advancement surmounts the hurdles posed by the irregular and non-Euclidean characteristics of mesh data, facilitating a smooth integration with Transformer architecture. Next, we propose a dual-stream structure consisting of a Geometric Encoder branch and a Spatial Encoder branch, which jointly encode local geometry details and spatial information to fully explore multimodal information for mesh denoising. A subsequent Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation through self-attention operators. Our experimental evaluations demonstrate that this novel approach outperforms existing state-of-the-art methods in both objective and subjective assessments, marking a significant leap forward in mesh denoising.

Mesh Denoising Transformer

TL;DR

This work tackles mesh denoising by addressing two core challenges: loss of multi-attribute information from single-modal representations and limited global feature aggregation. It introduces Local Surface Descriptor (LSD), a multimodal representation that encodes local geometry as image-like patches and spatial context as a point cloud, enabling effective Transformer modeling. The SurfaceFormer framework employs a dual-stream Geometric Encoder and Spatial Encoder, followed by a Denoising Transformer to achieve global feature aggregation and robust denoising, with a Vertex Refinement step that aligns denoised vertices to normals. Experiments on synthetic, Kinect, real-scanned, and reconstructed datasets show state-of-the-art performance in both objective metrics and and subjective quality, demonstrating strong generalization and practical applicability for diverse scanning pipelines.

Abstract

Mesh denoising, aimed at removing noise from input meshes while preserving their feature structures, is a practical yet challenging task. Despite the remarkable progress in learning-based mesh denoising methodologies in recent years, their network designs often encounter two principal drawbacks: a dependence on single-modal geometric representations, which fall short in capturing the multifaceted attributes of meshes, and a lack of effective global feature aggregation, hindering their ability to fully understand the mesh's comprehensive structure. To tackle these issues, we propose SurfaceFormer, a pioneering Transformer-based mesh denoising framework. Our first contribution is the development of a new representation known as Local Surface Descriptor, which is crafted by establishing polar systems on each mesh face, followed by sampling points from adjacent surfaces using geodesics. The normals of these points are organized into 2D patches, mimicking images to capture local geometric intricacies, whereas the poles and vertex coordinates are consolidated into a point cloud to embody spatial information. This advancement surmounts the hurdles posed by the irregular and non-Euclidean characteristics of mesh data, facilitating a smooth integration with Transformer architecture. Next, we propose a dual-stream structure consisting of a Geometric Encoder branch and a Spatial Encoder branch, which jointly encode local geometry details and spatial information to fully explore multimodal information for mesh denoising. A subsequent Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation through self-attention operators. Our experimental evaluations demonstrate that this novel approach outperforms existing state-of-the-art methods in both objective and subjective assessments, marking a significant leap forward in mesh denoising.
Paper Structure (25 sections, 16 equations, 14 figures, 3 tables)

This paper contains 25 sections, 16 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: The overall framework of the proposed method, which includes four main steps: Patch Generation, Surface Representation, Mesh Denoising and Vertex Refinement.
  • Figure 2: The patch generation process of Pyramid when $T_f=240$. (a) $f_0$ is employed as the initial patch center, highlighted in red. (b) K-ring faces surrounding $f_0$ are incorporated into the patch until the number of faces reaches $T_f$. (c) The nearest unvisited face to $f_0$ becomes the next center, highlighted in blue. Another patch is built around it. (d) This procedure continues until all the faces are visited, and the patch centers are highlighted in red.
  • Figure 3: Sample $20\times20$ points (a) with uniformly distributed Cartesian coordinates, in which the sampling density is consistent. (b) with uniformly distributed polar coordinates, in which the sampling density tends to be sparser with increased radius.
  • Figure 4: An example of sampling by shooting geodesic. The polar coordinate system is built on $f_i$, ${\mathbf c}_i$ and ${\mathbf d}_i$ are the polar and axis, respectively. (a) The geodesics starts from ${\mathbf c}_i$ at angle $\varphi$. (b) When the geodesics reaches edge $e_{ij}$, $f_j$ is rotated to the same plane as $f_i$ around $e_{ij}$ to allow the propagation. (c) The geodesics stops propagation when the length is equal to $r$, and the end point is regarded as the sampling point.
  • Figure 5: An illustration of the LSD generation with $T_s=10$.
  • ...and 9 more figures