Table of Contents
Fetching ...

CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation

Yuejiao Su, Yi Wang, Lap-Pui Chau

TL;DR

CaRe-Ego addresses EgoIHOS, the task of segmenting hands and objects interacting with hands in egocentric images. It introduces HOFE to inject hand priors into object feature learning and CODS to decouple object categories and avoid two-hand classification, enabling stronger hand–object contact modeling. The method achieves state-of-the-art performance on EgoHOS in-domain and out-of-domain datasets and demonstrates robust generalization on mini-HOI4D. Ablation studies validate the contribution of HOFE and CODS to the gains. This work advances fine-grained egocentric segmentation with explicit interaction modeling, with practical impact on AR/VR and assistive systems.

Abstract

Egocentric Interactive hand-object segmentation (EgoIHOS) requires the segmentation of hands and interacting objects in egocentric images, which is crucial for understanding human behavior in assistive systems. Previous methods typically recognize hands and interacting objects as distinct semantic categories based solely on visual features, or simply use hand predictions as auxiliary cues for object segmentation. Despite the promising progress achieved by these methods, they fail to adequately model the interactive relationships between hands and objects while ignoring the coupled physical relationships among object categories, ultimately constraining their segmentation performance. To make up for the shortcomings of existing methods, we propose a novel method called CaRe-Ego that achieves state-of-the-art performance by emphasizing the contact between hands and objects from two aspects. First, we introduce a Hand-guided Object Feature Enhancer (HOFE) to establish the hand-object interactive relationships to extract more contact-relevant and discriminative object features. Second, we design the Contact-centric Object Decoupling Strategy (CODS) to explicitly model and disentangle coupling relationships among object categories, thereby emphasizing contact-aware feature learning. Experiments on various in-domain and out-of-domain test sets show that Care-Ego significantly outperforms existing methods with robust generalization capability. Codes are publicly available at https://github.com/yuggiehk/CaRe-Ego/.

CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation

TL;DR

CaRe-Ego addresses EgoIHOS, the task of segmenting hands and objects interacting with hands in egocentric images. It introduces HOFE to inject hand priors into object feature learning and CODS to decouple object categories and avoid two-hand classification, enabling stronger hand–object contact modeling. The method achieves state-of-the-art performance on EgoHOS in-domain and out-of-domain datasets and demonstrates robust generalization on mini-HOI4D. Ablation studies validate the contribution of HOFE and CODS to the gains. This work advances fine-grained egocentric segmentation with explicit interaction modeling, with practical impact on AR/VR and assistive systems.

Abstract

Egocentric Interactive hand-object segmentation (EgoIHOS) requires the segmentation of hands and interacting objects in egocentric images, which is crucial for understanding human behavior in assistive systems. Previous methods typically recognize hands and interacting objects as distinct semantic categories based solely on visual features, or simply use hand predictions as auxiliary cues for object segmentation. Despite the promising progress achieved by these methods, they fail to adequately model the interactive relationships between hands and objects while ignoring the coupled physical relationships among object categories, ultimately constraining their segmentation performance. To make up for the shortcomings of existing methods, we propose a novel method called CaRe-Ego that achieves state-of-the-art performance by emphasizing the contact between hands and objects from two aspects. First, we introduce a Hand-guided Object Feature Enhancer (HOFE) to establish the hand-object interactive relationships to extract more contact-relevant and discriminative object features. Second, we design the Contact-centric Object Decoupling Strategy (CODS) to explicitly model and disentangle coupling relationships among object categories, thereby emphasizing contact-aware feature learning. Experiments on various in-domain and out-of-domain test sets show that Care-Ego significantly outperforms existing methods with robust generalization capability. Codes are publicly available at https://github.com/yuggiehk/CaRe-Ego/.
Paper Structure (23 sections, 10 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of EgoIHOS task. This task aims to segment the input egocentric image into five categories: left hand, right hand, left-hand objects, right-hand objects, and two-hand objects.
  • Figure 2: Overall diagram of the proposed CaRe-Ego. The method comprises four main components: an encoder, a multi-branch decoder, a hand-guided object feature enhancer (HOFE) (Sec. \ref{['sec::HOR']}), and a contact-centric object decoupling strategy (CODS) (Sec. \ref{['sec::ordm']}). Rep. in this figure denotes representations.
  • Figure 3: Detailed architecture of the proposed HOFE. Taking the features of hands and objects as input, this module accomplishes hand-guided attention to model the relationship between the hands and interacting objects, promoting the contact-relevant object feature learning of the network.
  • Figure 4: Visualization results of the CaRe-Ego compared with the multi-stage method $\text{Seq.}^\flat$ on the EgoIHOS in-domain test set. The main improvements are highlighted in the dashed yellow box.
  • Figure 5: Visualization results of two-hand objects on the EgoHOS in-domain test set compared with multi-stage method $\text{Seq}^\flat$. The main improvements are highlighted in the dashed yellow box.
  • ...and 1 more figures