Table of Contents
Fetching ...

Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes

Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai

TL;DR

Mesh Mamba introduces a unified state-space model for saliency prediction on both textured and non-textured meshes, coupling a graph-convolution encoder with a texture-aligned latent code map $E_{\,varphi}(I)$, a subgraph embedding module, and a bidirectional SSM-based Mamba block. The approach preserves mesh topology while incorporating texture cues, enabling global context modeling via token sequences and diffusion/aggregation operations, and producing dense saliency maps through voting interpolation. A VR eye-tracking dataset comparing textured vs non-textured meshes is introduced and evaluated, showing texture cues substantially improve textured-mesh saliency, while geometry dominates non-textured cases. Across SAL3D and proposed datasets, Mesh Mamba achieves state-of-the-art or competitive performance with linear computational scaling, highlighting the value of multimodal feature integration for scalable, versatile 3D saliency modeling and saliency-guided mesh simplification.

Abstract

Mesh saliency enhances the adaptability of 3D vision by identifying and emphasizing regions that naturally attract visual attention. To investigate the interaction between geometric structure and texture in shaping visual attention, we establish a comprehensive mesh saliency dataset, which is the first to systematically capture the differences in saliency distribution under both textured and non-textured visual conditions. Furthermore, we introduce mesh Mamba, a unified saliency prediction model based on a state space model (SSM), designed to adapt across various mesh types. Mesh Mamba effectively analyzes the geometric structure of the mesh while seamlessly incorporating texture features into the topological framework, ensuring coherence throughout appearance-enhanced modeling. More importantly, by subgraph embedding and a bidirectional SSM, the model enables global context modeling for both local geometry and texture, preserving the topological structure and improving the understanding of visual details and structural complexity. Through extensive theoretical and empirical validation, our model not only improves performance across various mesh types but also demonstrates high scalability and versatility, particularly through cross validations of various visual features.

Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes

TL;DR

Mesh Mamba introduces a unified state-space model for saliency prediction on both textured and non-textured meshes, coupling a graph-convolution encoder with a texture-aligned latent code map , a subgraph embedding module, and a bidirectional SSM-based Mamba block. The approach preserves mesh topology while incorporating texture cues, enabling global context modeling via token sequences and diffusion/aggregation operations, and producing dense saliency maps through voting interpolation. A VR eye-tracking dataset comparing textured vs non-textured meshes is introduced and evaluated, showing texture cues substantially improve textured-mesh saliency, while geometry dominates non-textured cases. Across SAL3D and proposed datasets, Mesh Mamba achieves state-of-the-art or competitive performance with linear computational scaling, highlighting the value of multimodal feature integration for scalable, versatile 3D saliency modeling and saliency-guided mesh simplification.

Abstract

Mesh saliency enhances the adaptability of 3D vision by identifying and emphasizing regions that naturally attract visual attention. To investigate the interaction between geometric structure and texture in shaping visual attention, we establish a comprehensive mesh saliency dataset, which is the first to systematically capture the differences in saliency distribution under both textured and non-textured visual conditions. Furthermore, we introduce mesh Mamba, a unified saliency prediction model based on a state space model (SSM), designed to adapt across various mesh types. Mesh Mamba effectively analyzes the geometric structure of the mesh while seamlessly incorporating texture features into the topological framework, ensuring coherence throughout appearance-enhanced modeling. More importantly, by subgraph embedding and a bidirectional SSM, the model enables global context modeling for both local geometry and texture, preserving the topological structure and improving the understanding of visual details and structural complexity. Through extensive theoretical and empirical validation, our model not only improves performance across various mesh types but also demonstrates high scalability and versatility, particularly through cross validations of various visual features.

Paper Structure

This paper contains 17 sections, 3 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: VR eye-tracking experiment for saliency. (a) represents the virtual space setup for the experiment, (b) shows the collection of eye-tracking fixation intersections for generating saliency maps.
  • Figure 2: Model architecture, including the texture alignment and geometric structure within the graph convolution encoder, along with subgraph embedding, the Mamba Block, and a feature propagation for dense prediction.
  • Figure 3: Texture alignment with implicit representation.
  • Figure 4: Feature types of geometric structure.
  • Figure 5: Visualization results of compared methods on the non-textured meshes.
  • ...and 3 more figures