SAMa: Material-aware 3D Selection and Segmentation
Michael Fischer, Iliyan Georgiev, Thibault Groueix, Vladimir G. Kim, Tobias Ritschel, Valentin Deschaintre
TL;DR
SAMa addresses the challenge of material selection on 3D objects by adapting a video-based material selector (SAM2) through fine-tuning on a material-focused dataset to achieve multiview-consistent 2D materials predictions. It then lifts these 2D similarities into a lightweight 3D similarity point cloud via depth back-projection and nearest-neighbor queries, enabling interactive, cross-view material selection across arbitrary 3D representations (NeRFs, 3D Gaussians, meshes) without per-asset optimization. The approach yields improved selection accuracy and multiview consistency over strong baselines, while offering fast per-view visualization and broad applicability to segmentation and editing tasks. This work enables material-aware editing, replacement, and segmentation in 3D synthesis pipelines, enhancing X-to-3D workflows and downstream material manipulation across multiple representations.
Abstract
Decomposing 3D assets into material parts is a common task for artists and creators, yet remains a highly manual process. In this work, we introduce Select Any Material (SAMa), a material selection approach for various 3D representations. Building on the recently introduced SAM2 video selection model, we extend its capabilities to the material domain. We leverage the model's cross-view consistency to create a 3D-consistent intermediate material-similarity representation in the form of a point cloud from a sparse set of views. Nearest-neighbour lookups in this similarity cloud allow us to efficiently reconstruct accurate continuous selection masks over objects' surfaces that can be inspected from any view. Our method is multiview-consistent by design, alleviating the need for contrastive learning or feature-field pre-processing, and performs optimization-free selection in seconds. Our approach works on arbitrary 3D representations and outperforms several strong baselines in terms of selection accuracy and multiview consistency. It enables several compelling applications, such as replacing the diffuse-textured materials on a text-to-3D output, or selecting and editing materials on NeRFs and 3D-Gaussians.
