Table of Contents
Fetching ...

Multi-Part Object Representations via Graph Structures and Co-Part Discovery

Alex Foo, Wynne Hsu, Mong Li Lee

TL;DR

The paper introduces ECO-Net, an Explicit Compositional Network that represents multi-part objects as graphs of parts and learns to discover object wholes via a co-part object discovery algorithm. A memory module stores recurring objects to support downstream tasks, enabling robust occlusion handling and out-of-distribution generalization. Through extensive experiments on simulated, realistic, and real-world datasets, ECO-Net outperforms state-of-the-art methods in object discovery, occlusion-aware perception, and generalization, and its object representations improve downstream property prediction. The approach demonstrates the value of explicit part-whole structure for robust, interpretable object-centric perception in complex scenes.

Abstract

Discovering object-centric representations from images can significantly enhance the robustness, sample efficiency and generalizability of vision models. Works on images with multi-part objects typically follow an implicit object representation approach, which fail to recognize these learned objects in occluded or out-of-distribution contexts. This is due to the assumption that object part-whole relations are implicitly encoded into the representations through indirect training objectives. We address this limitation by proposing a novel method that leverages on explicit graph representations for parts and present a co-part object discovery algorithm. We then introduce three benchmarks to evaluate the robustness of object-centric methods in recognizing multi-part objects within occluded and out-of-distribution settings. Experimental results on simulated, realistic, and real-world images show marked improvements in the quality of discovered objects compared to state-of-the-art methods, as well as the accurate recognition of multi-part objects in occluded and out-of-distribution contexts. We also show that the discovered object-centric representations can more accurately predict key object properties in a downstream task, highlighting the potential of our method to advance the field of object-centric representations.

Multi-Part Object Representations via Graph Structures and Co-Part Discovery

TL;DR

The paper introduces ECO-Net, an Explicit Compositional Network that represents multi-part objects as graphs of parts and learns to discover object wholes via a co-part object discovery algorithm. A memory module stores recurring objects to support downstream tasks, enabling robust occlusion handling and out-of-distribution generalization. Through extensive experiments on simulated, realistic, and real-world datasets, ECO-Net outperforms state-of-the-art methods in object discovery, occlusion-aware perception, and generalization, and its object representations improve downstream property prediction. The approach demonstrates the value of explicit part-whole structure for robust, interpretable object-centric perception in complex scenes.

Abstract

Discovering object-centric representations from images can significantly enhance the robustness, sample efficiency and generalizability of vision models. Works on images with multi-part objects typically follow an implicit object representation approach, which fail to recognize these learned objects in occluded or out-of-distribution contexts. This is due to the assumption that object part-whole relations are implicitly encoded into the representations through indirect training objectives. We address this limitation by proposing a novel method that leverages on explicit graph representations for parts and present a co-part object discovery algorithm. We then introduce three benchmarks to evaluate the robustness of object-centric methods in recognizing multi-part objects within occluded and out-of-distribution settings. Experimental results on simulated, realistic, and real-world images show marked improvements in the quality of discovered objects compared to state-of-the-art methods, as well as the accurate recognition of multi-part objects in occluded and out-of-distribution contexts. We also show that the discovered object-centric representations can more accurately predict key object properties in a downstream task, highlighting the potential of our method to advance the field of object-centric representations.

Paper Structure

This paper contains 30 sections, 1 theorem, 2 equations, 12 figures, 6 tables, 1 algorithm.

Key Result

Theorem 3.1

Given a graph of parts $\hat{\mathcal{G}} = (\mathcal{P}, \hat{\mathcal{E}})$ derived from a batch of images, there exists a pairwise iterative clustering algorithm that solves the composition of recurrent parts with similar spatial relations into objects in $O(|\mathcal{P}|^2 \log |\mathcal{P}|)$ t

Figures (12)

  • Figure 1: Comparison of existing methods. (a) Information-bottleneck approaches face challenges with multi-part object recognition, while (b) methods that rely on learned features struggle to compose part clusters into object wholes. Our proposed Explicit Compositional Network (ECO-Net) consistently and accurately composes parts into object wholes.
  • Figure 2: Overview of our proposed ECO-Net.
  • Figure 3: Overview of object discovery. (a) Each node in the graph summarizes the shape of each part using vectors from its centroid to the sampled boundary pixels, while each edge depicts the spatial relationships between parts. (b) Repeated occurrence of parts with similar spatial relationships are grouped into objects.
  • Figure 4: Visualization of discovered objects.
  • Figure 5: Visualization of discovered objects with occlusion.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof