Table of Contents
Fetching ...

EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions

Taegyoon Yoon, Yegyu Han, Seojin Ji, Jaewoo Park, Sojeong Kim, Taein Kwon, Hyung-Sin Kim

Abstract

Smart glass is emerging as an useful device since it provides plenty of insights under hands-busy, eyes-on-task situations. To understand the context of the wearer, 6D object pose estimation in egocentric view is becoming essential. However, existing 6D object pose estimation benchmarks fail to capture the challenges of real-world egocentric applications, which are often dominated by severe motion blur, dynamic illumination, and visual obstructions. This discrepancy creates a significant gap between controlled lab data and chaotic real-world application. To bridge this gap, we introduce EgoXtreme, a new large-scale 6D pose estimation dataset captured entirely from an egocentric perspective. EgoXtreme features three challenging scenarios - industrial maintenance, sports, and emergency rescue - designed to introduce severe perceptual ambiguities through extreme lighting, heavy motion blur, and smoke. Evaluations of state-of-the-art generalizable pose estimators on EgoXtreme indicate that their generalization fails to hold in extreme conditions, especially under low light. We further demonstrate that simply applying image restoration (e.g., deblurring) offers no positive improvement for extreme conditions. While performance gain has appeared in tracking-based approach, implying using temporal information in fast-motion scenarios is meaningful. We conclude that EgoXtreme is an essential resource for developing and evaluating the next generation of pose estimation models robust enough for real-world egocentric vision. The dataset and code are available at https://taegyoun88.github.io/EgoXtreme/

EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions

Abstract

Smart glass is emerging as an useful device since it provides plenty of insights under hands-busy, eyes-on-task situations. To understand the context of the wearer, 6D object pose estimation in egocentric view is becoming essential. However, existing 6D object pose estimation benchmarks fail to capture the challenges of real-world egocentric applications, which are often dominated by severe motion blur, dynamic illumination, and visual obstructions. This discrepancy creates a significant gap between controlled lab data and chaotic real-world application. To bridge this gap, we introduce EgoXtreme, a new large-scale 6D pose estimation dataset captured entirely from an egocentric perspective. EgoXtreme features three challenging scenarios - industrial maintenance, sports, and emergency rescue - designed to introduce severe perceptual ambiguities through extreme lighting, heavy motion blur, and smoke. Evaluations of state-of-the-art generalizable pose estimators on EgoXtreme indicate that their generalization fails to hold in extreme conditions, especially under low light. We further demonstrate that simply applying image restoration (e.g., deblurring) offers no positive improvement for extreme conditions. While performance gain has appeared in tracking-based approach, implying using temporal information in fast-motion scenarios is meaningful. We conclude that EgoXtreme is an essential resource for developing and evaluating the next generation of pose estimation models robust enough for real-world egocentric vision. The dataset and code are available at https://taegyoun88.github.io/EgoXtreme/

Paper Structure

This paper contains 27 sections, 13 figures, 10 tables.

Figures (13)

  • Figure 1: EgoXtreme, an egocentric dataset for robust 6D object pose estimation in extreme environments. The dataset provides 775.5 minutes of egocentric RGB video from 15 participants using Aria glasses. As illustrated, it spans three challenging scenarios—Sports, Maintenance, and Emergency—featuring significant real-world visual degradations such as low light, smoke, and motion blur.
  • Figure 2: Visualization of ground truth 6D pose annotations from our dataset, arranged sequentially from top-left to bottom-right. The frames include the sports scenario (normal and middle light), the industrial maintenance scenario (low light, flashlight with smoke, and headlight), and the emergency rescue scenario (warning light, exit green light, and high light with smoke).
  • Figure 3: 3D models. This image shows the 13 object models in the EgoXtreme dataset. From top to bottom five maintenance scenarios, five sports scenarios and three emergency scenarios.
  • Figure 4: Diagram for data collection.
  • Figure 5: Example 6D Pose estimation results on baseline models. The red line is prediction and green is GT. (a), (b), and (c) are the industry maintenance, sports, and emergency rescue scenarios, respectively. The top row indicates standard light condition, and the bottom row indicates extreme light condition.
  • ...and 8 more figures