Table of Contents
Fetching ...

MaRI: Material Retrieval Integration across Domains

Jianhui Wang, Zhifei Yang, Yangfan He, Huixiong Zhang, Yuxuan Chen, Jingwei Huang

TL;DR

MaRI tackles material retrieval by learning a shared embedding space bridging visual and material properties across synthetic and real data. It employs dual encoders based on DINOv2, trained with a contrastive loss on a large mixed dataset $\mathcal{D}=\{(\mathcal{D}_{synthetic},\mathcal{D}_{real})\}$, fine-tuning only the last Transformer block. The method demonstrates improved instance-level and class-level retrieval on both trained and unseen material galleries, outperforming baselines such as ViT, CLIP, DINOv2, Make-it-Real, and MaPa. The dataset construction, including ZeST-based real-material spheres and Blender renders, bridges the domain gap and supports robust material retrieval for photorealistic 3D asset creation. This work enables efficient, material-aware search and application in design pipelines.

Abstract

Accurate material retrieval is critical for creating realistic 3D assets. Existing methods rely on datasets that capture shape-invariant and lighting-varied representations of materials, which are scarce and face challenges due to limited diversity and inadequate real-world generalization. Most current approaches adopt traditional image search techniques. They fall short in capturing the unique properties of material spaces, leading to suboptimal performance in retrieval tasks. Addressing these challenges, we introduce MaRI, a framework designed to bridge the feature space gap between synthetic and real-world materials. MaRI constructs a shared embedding space that harmonizes visual and material attributes through a contrastive learning strategy by jointly training an image and a material encoder, bringing similar materials and images closer while separating dissimilar pairs within the feature space. To support this, we construct a comprehensive dataset comprising high-quality synthetic materials rendered with controlled shape variations and diverse lighting conditions, along with real-world materials processed and standardized using material transfer techniques. Extensive experiments demonstrate the superior performance, accuracy, and generalization capabilities of MaRI across diverse and complex material retrieval tasks, outperforming existing methods.

MaRI: Material Retrieval Integration across Domains

TL;DR

MaRI tackles material retrieval by learning a shared embedding space bridging visual and material properties across synthetic and real data. It employs dual encoders based on DINOv2, trained with a contrastive loss on a large mixed dataset , fine-tuning only the last Transformer block. The method demonstrates improved instance-level and class-level retrieval on both trained and unseen material galleries, outperforming baselines such as ViT, CLIP, DINOv2, Make-it-Real, and MaPa. The dataset construction, including ZeST-based real-material spheres and Blender renders, bridges the domain gap and supports robust material retrieval for photorealistic 3D asset creation. This work enables efficient, material-aware search and application in design pipelines.

Abstract

Accurate material retrieval is critical for creating realistic 3D assets. Existing methods rely on datasets that capture shape-invariant and lighting-varied representations of materials, which are scarce and face challenges due to limited diversity and inadequate real-world generalization. Most current approaches adopt traditional image search techniques. They fall short in capturing the unique properties of material spaces, leading to suboptimal performance in retrieval tasks. Addressing these challenges, we introduce MaRI, a framework designed to bridge the feature space gap between synthetic and real-world materials. MaRI constructs a shared embedding space that harmonizes visual and material attributes through a contrastive learning strategy by jointly training an image and a material encoder, bringing similar materials and images closer while separating dissimilar pairs within the feature space. To support this, we construct a comprehensive dataset comprising high-quality synthetic materials rendered with controlled shape variations and diverse lighting conditions, along with real-world materials processed and standardized using material transfer techniques. Extensive experiments demonstrate the superior performance, accuracy, and generalization capabilities of MaRI across diverse and complex material retrieval tasks, outperforming existing methods.

Paper Structure

This paper contains 19 sections, 6 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison between Material Palette and MaRI.
  • Figure 2: Overview of our dataset construction pipeline. (a) Synthetic materials are generated from 3D models obtained from Objaverse, combined with textures from AmbientCG, and rendered with HDR images. (b) Real-world materials are selected and segmented using Grounded-SAM and then transformed into material spheres via the ZeST method.
  • Figure 2: Material assignment and rendering for a 3D chair model.
  • Figure 3: The architecture of the MaRI framework for contrastive fine-tuning in material retrieval. MaRI uses DINOv2-based encoders for both image and material feature extraction, fine-tuning only the last Transformer block, while keeping the rest of the model frozen. During inference, cosine similarity between image and material embeddings is used to retrieve the most relevant materials from the library.
  • Figure 3: Material assignment and rendering for a 3D plant model.
  • ...and 6 more figures