Table of Contents
Fetching ...

MatE: Material Extraction from Single-Image via Geometric Prior

Zeyu Zhang, Wei Zhai, Jian Yang, Yang Cao

TL;DR

MatE presents a coarse-to-fine framework for recovering tileable PBR material maps from a single image by first rectifying perspective distortions using a geometric prior and depth, then refining residual distortions with a dual-branch diffusion model conditioned through KV injection. Training relies on rotation-aligned synthetic data to bridge the synthetic-real domain gap and enforce consistent material orientation. Empirical results on synthetic and real datasets show state-of-the-art performance in perceptual metrics (LPIPS, CLIP) while maintaining competitive structural fidelity (SSIM), highlighting robustness to viewpoint and illumination variations. The work offers a practical, diffusion-based path toward democratizing high-quality material extraction for real-world graphics pipelines, with insights on tileability and implementation trade-offs.

Abstract

The creation of high-fidelity, physically-based rendering (PBR) materials remains a bottleneck in many graphics pipelines, typically requiring specialized equipment and expert-driven post-processing. To democratize this process, we present MatE, a novel method for generating tileable PBR materials from a single image taken under unconstrained, real-world conditions. Given an image and a user-provided mask, MatE first performs coarse rectification using an estimated depth map as a geometric prior, and then employs a dual-branch diffusion model. Leveraging a learned consistency from rotation-aligned and scale-aligned training data, this model further rectify residual distortions from the coarse result and translate it into a complete set of material maps, including albedo, normal, roughness and height. Our framework achieves invariance to the unknown illumination and perspective of the input image, allowing for the recovery of intrinsic material properties from casual captures. Through comprehensive experiments on both synthetic and real-world data, we demonstrate the efficacy and robustness of our approach, enabling users to create realistic materials from real-world image.

MatE: Material Extraction from Single-Image via Geometric Prior

TL;DR

MatE presents a coarse-to-fine framework for recovering tileable PBR material maps from a single image by first rectifying perspective distortions using a geometric prior and depth, then refining residual distortions with a dual-branch diffusion model conditioned through KV injection. Training relies on rotation-aligned synthetic data to bridge the synthetic-real domain gap and enforce consistent material orientation. Empirical results on synthetic and real datasets show state-of-the-art performance in perceptual metrics (LPIPS, CLIP) while maintaining competitive structural fidelity (SSIM), highlighting robustness to viewpoint and illumination variations. The work offers a practical, diffusion-based path toward democratizing high-quality material extraction for real-world graphics pipelines, with insights on tileability and implementation trade-offs.

Abstract

The creation of high-fidelity, physically-based rendering (PBR) materials remains a bottleneck in many graphics pipelines, typically requiring specialized equipment and expert-driven post-processing. To democratize this process, we present MatE, a novel method for generating tileable PBR materials from a single image taken under unconstrained, real-world conditions. Given an image and a user-provided mask, MatE first performs coarse rectification using an estimated depth map as a geometric prior, and then employs a dual-branch diffusion model. Leveraging a learned consistency from rotation-aligned and scale-aligned training data, this model further rectify residual distortions from the coarse result and translate it into a complete set of material maps, including albedo, normal, roughness and height. Our framework achieves invariance to the unknown illumination and perspective of the input image, allowing for the recovery of intrinsic material properties from casual captures. Through comprehensive experiments on both synthetic and real-world data, we demonstrate the efficacy and robustness of our approach, enabling users to create realistic materials from real-world image.

Paper Structure

This paper contains 22 sections, 8 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: We propose MatE, a novel method for high-fidelity Physically Based Rendering (PBR) material extraction. Given a region, our method first performs rectification via geometric prior, followed by further reducing distortion and extracting the target material. The region can be sourced from user input or segmentation models like SAM kirillov2023segment. The extracted PBR materials(Albedo, Normal, Roughness, Height) enable the construction and texturing of realistic 3D scenes.
  • Figure 2: Overview of our pipeline, $\mathcal{E}$ denotes the pre-trained encoder. (Left) Our model consists of a Reference U-Net that processes the masked input latents to extract conditional KV features and a Main U-Net that denoises the the latent material maps ($z_{t^{\prime}}$) guided by the injected KV features. (Right) Visualization of our coarse rectification based on geometric prior.
  • Figure 3: Overview of our dataset construction pipeline. We apply thin-plate spline (TPS) transformations to planar meshes to introduce geometric distortions. PBR materials are then mapped onto these meshes using UV coordinates, and HDRIs are employed for realistic environmental illumination. From randomly sampled camera positions and viewpoints, we then utilize Blender to render synthetic images and their corresponding masks, concurrently saving the camera poses which are essential during our training.
  • Figure 4: Our unprojection (Eq. \ref{['eq:project']}) generates a coarsely rectified texture, suffers from holes (Column 4). Our interpolation (Eq. \ref{['eq:interpolate']}) fills these artifacts to produce a dense map (Column 5).
  • Figure 5: To circumvent artifacts such as physically implausible scaling variations(eg. inset (a)) and structural disruptions caused by discontinuous UVs(eg. insets (b) and (c)) that arise when mapping materials to complex 3D models, we generate our dataset using topologically simpler, planar meshes distorted via thin-plate spline transformation.
  • ...and 10 more figures