Table of Contents
Fetching ...

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu

TL;DR

PACE provides a large-scale real-world and synthetic benchmark for 3D pose estimation and tracking in cluttered environments, coupling 55K real frames with 258K annotations and 238 objects across 43 categories with 100K synthetic frames and 2.4M annotations in PACE-Sim. It introduces a calibrated 3-camera annotation pipeline and comprehensive evaluation protocols for pose estimation and tracking, revealing substantial generalization gaps and sim-to-real challenges, particularly for articulated objects and depth-based methods. The dataset combines diverse real-world scenes with articulated objects, 3D scans, and segmentation masks, enabling robust supervision and fair comparison across instance- and category-level tasks. The authors provide baselines, failure analyses, and insights to guide future work toward more robust, scalable, and transferable pose estimation systems in real-world clutter.

Abstract

We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACE-Sim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our benchmark code and data is available on https://github.com/qq456cvb/PACE.

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

TL;DR

PACE provides a large-scale real-world and synthetic benchmark for 3D pose estimation and tracking in cluttered environments, coupling 55K real frames with 258K annotations and 238 objects across 43 categories with 100K synthetic frames and 2.4M annotations in PACE-Sim. It introduces a calibrated 3-camera annotation pipeline and comprehensive evaluation protocols for pose estimation and tracking, revealing substantial generalization gaps and sim-to-real challenges, particularly for articulated objects and depth-based methods. The dataset combines diverse real-world scenes with articulated objects, 3D scans, and segmentation masks, enabling robust supervision and fair comparison across instance- and category-level tasks. The authors provide baselines, failure analyses, and insights to guide future work toward more robust, scalable, and transferable pose estimation systems in real-world clutter.

Abstract

We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACE-Sim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark's challenges and research opportunities. Our benchmark code and data is available on https://github.com/qq456cvb/PACE.
Paper Structure (44 sections, 1 equation, 8 figures, 6 tables)

This paper contains 44 sections, 1 equation, 8 figures, 6 tables.

Figures (8)

  • Figure 1: We propose PACE: a large-scale object pose dataset, with diverse objects, complex scenes, and various types of occlusions, reflecting real-world challenges.
  • Figure 2: While current state-of-the-art methods yield satisfactory outcomes on the NOCS-REAL275 dataset, their models' performance significantly deteriorates when transferred to previously unseen datasets such as PACE. Left: Qualitative visualizations of various models' pose predictions for a mug in REAL275 vs. a mug in PACE. Right: The performance of state-of-the-art methods markedly declines on PACE, even when evaluating on categories that exist in both datasets.
  • Figure 3: Overview of the PACE annotation pipeline.
  • Figure 4: Illustration of the marker inpainting process.
  • Figure 5: Distribution of pose annotation counts across different object categories.
  • ...and 3 more figures