Table of Contents
Fetching ...

LLM-Guided Material Inference for 3D Point Clouds

Nafiseh Izadyar, Teseo Schneider

TL;DR

This work tackles the lack of material and appearance annotations for 3D shapes by introducing a two-stage, zero-shot LLM-driven framework that first extracts object semantics from coarse geometry and then assigns materials to each geometric segment conditioned on those semantics. The approach leverages multi-view renderings and LLM prompts, and evaluates material plausibility using an LLM-as-a-judge framework (DeepEval) across 1,000 shapes from Fusion/ABS and ShapeNet. Results show high semantic accuracy and strong per-segment material plausibility, highlighting the potential of LLM priors to bridge geometric reasoning and appearance understanding in 3D data without labeled materials. This work opens avenues for material-aware 3D perception that can benefit photorealistic rendering and robotics without requiring extensive material annotations.

Abstract

Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds with coarse segmentations. Our key insight is to decouple reasoning about what an object is from what it is made of. In the first stage, an LLM predicts the object's semantic; in the second stage, it assigns plausible materials to each geometric segment, conditioned on the inferred semantics. Both stages operate in a zero-shot manner, without task-specific training. Because existing datasets lack reliable material annotations, we evaluate our method using an LLM-as-a-Judge implemented in DeepEval. Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility. These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.

LLM-Guided Material Inference for 3D Point Clouds

TL;DR

This work tackles the lack of material and appearance annotations for 3D shapes by introducing a two-stage, zero-shot LLM-driven framework that first extracts object semantics from coarse geometry and then assigns materials to each geometric segment conditioned on those semantics. The approach leverages multi-view renderings and LLM prompts, and evaluates material plausibility using an LLM-as-a-judge framework (DeepEval) across 1,000 shapes from Fusion/ABS and ShapeNet. Results show high semantic accuracy and strong per-segment material plausibility, highlighting the potential of LLM priors to bridge geometric reasoning and appearance understanding in 3D data without labeled materials. This work opens avenues for material-aware 3D perception that can benefit photorealistic rendering and robotics without requiring extensive material annotations.

Abstract

Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds with coarse segmentations. Our key insight is to decouple reasoning about what an object is from what it is made of. In the first stage, an LLM predicts the object's semantic; in the second stage, it assigns plausible materials to each geometric segment, conditioned on the inferred semantics. Both stages operate in a zero-shot manner, without task-specific training. Because existing datasets lack reliable material annotations, we evaluate our method using an LLM-as-a-Judge implemented in DeepEval. Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility. These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.

Paper Structure

This paper contains 25 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: A pot-shaped object misclassified without semantic reasoning. Directly inferring materials from geometry assigns incorrect materials (i.e., foam, fabric), left. Our pull pipeline first identifies the object as a pot and then assigns plausible materials such as metal, plastic, and rubber.
  • Figure 2: Overview of our method: Stage 1 uses an LLM to infer high-level semantic information about the object from multi-view depth and raster renderings; Stage 2 combines the semantic label with the input coarse segmentation to predict a material for each part.
  • Figure 3: Existing state-of-the-art Zhao_2024_CVPR object classification methods fail on our raster or depth-only inputs, as they rely heavily on texture, color, or contextual cues; the couch is detected as a traffic light (left and middle). By assigning the correct material (right) the object is correctly identified.
  • Figure 4: Example of best view for several highlighted segments in pink.
  • Figure 5: Example of an object (top) from the Fusion dataset that clearly represents a Microsoft Kinect, but the material assignments are incorrect. In the data, the whole object is metal and plastic. Our method correctly assigns glass (for the lenses) and plastic (for the body) (bottom).
  • ...and 5 more figures