Table of Contents
Fetching ...

Hierarchical Material Recognition from Local Appearance

Matthew Beveridge, Shree K. Nayar

TL;DR

This work introduces a visually grounded hierarchy of materials and a dedicated in-the-wild dataset, Matador, to enable hierarchical material recognition from local appearance. It combines a visual taxonomy with a graph attention network to exploit taxonomic proximity, predicting full hierarchical labels and enabling robust inferences even when fine-grained identification fails. The approach achieves state-of-the-art results on Matador and existing benchmarks, benefits from rendering novel views to improve generalization to real-world, out-of-distribution imaging conditions, and demonstrates rapid few-shot adaptation to unseen materials. The contributions have practical implications for robotics and autonomous systems, providing both material-level classifications and associated mechanical properties to guide interaction and manipulation.

Abstract

We introduce a taxonomy of materials for hierarchical recognition from local appearance. Our taxonomy is motivated by vision applications and is arranged according to the physical traits of materials. We contribute a diverse, in-the-wild dataset with images and depth maps of the taxonomy classes. Utilizing the taxonomy and dataset, we present a method for hierarchical material recognition based on graph attention networks. Our model leverages the taxonomic proximity between classes and achieves state-of-the-art performance. We demonstrate the model's potential to generalize to adverse, real-world imaging conditions, and that novel views rendered using the depth maps can enhance this capability. Finally, we show the model's capacity to rapidly learn new materials in a few-shot learning setting.

Hierarchical Material Recognition from Local Appearance

TL;DR

This work introduces a visually grounded hierarchy of materials and a dedicated in-the-wild dataset, Matador, to enable hierarchical material recognition from local appearance. It combines a visual taxonomy with a graph attention network to exploit taxonomic proximity, predicting full hierarchical labels and enabling robust inferences even when fine-grained identification fails. The approach achieves state-of-the-art results on Matador and existing benchmarks, benefits from rendering novel views to improve generalization to real-world, out-of-distribution imaging conditions, and demonstrates rapid few-shot adaptation to unseen materials. The contributions have practical implications for robotics and autonomous systems, providing both material-level classifications and associated mechanical properties to guide interaction and manipulation.

Abstract

We introduce a taxonomy of materials for hierarchical recognition from local appearance. Our taxonomy is motivated by vision applications and is arranged according to the physical traits of materials. We contribute a diverse, in-the-wild dataset with images and depth maps of the taxonomy classes. Utilizing the taxonomy and dataset, we present a method for hierarchical material recognition based on graph attention networks. Our model leverages the taxonomic proximity between classes and achieves state-of-the-art performance. We demonstrate the model's potential to generalize to adverse, real-world imaging conditions, and that novel views rendered using the depth maps can enhance this capability. Finally, we show the model's capacity to rapidly learn new materials in a few-shot learning setting.

Paper Structure

This paper contains 18 sections, 7 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Overall Word Occurrences
  • Figure 3: The Matador Dataset. (a-b) Each sample includes a real-world image of a material taken at a high resolution and (c) its 3D structure (depth map). (d) The surrounding context is also captured. The dataset comprises $\sim$7,200 samples across the 57 material categories of the proposed taxonomy (\ref{['sec:taxonomy']}).
  • Figure 4: Rendering Novel Views from a Real-World Sample. From a captured material sample, we simulate its appearance under different magnifications, orientations, and camera settings. (a) We first create a 3D mesh and texture map it with the appearance image. We then apply spatial transformations to the mesh to change its pose. (b) The optical image of a novel view (including depth of field effects) is obtained by raytracing. It is then blurred to account for pixel area, and the result is sampled to produce the discrete image. Finally, noise is added, resulting in the novel view. By varying the parameters in this process, we render numerous novel views for each real-world sample.
  • Figure 5: Rendered Novel Views for Gravel. Examples of the many novel views rendered from a single real-world sample of gravel (left) using the process described in \ref{['sec:dataset']}.
  • Figure 6: Top-1 accuracy, ablating novel views during training of our models.
  • ...and 9 more figures