UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

Mu Chen; Minghan Chen; Yi Yang

UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

Mu Chen, Minghan Chen, Yi Yang

TL;DR

This paper proposes a novel approach to HOI detection that explicitly estimates prediction uncertainty during the training process to refine both detection and interaction predictions, and model this uncertainty through the variance of predictions and incorporate it into the optimization objective, allowing the model to adaptively adjust its confidence threshold based on prediction variance.

Abstract

This paper focuses on Human-Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR), recent developments lead to significant improvements by replacing traditional region proposals by a set of learnable queries. However, despite the powerful representation capabilities provided by Transformers, existing Human-Object Interaction (HOI) detection methods still yield low confidence levels when dealing with complex interactions and are prone to overlooking interactive actions. To address these issues, we propose a novel approach \textsc{UAHOI}, Uncertainty-aware Robust Human-Object Interaction Learning that explicitly estimates prediction uncertainty during the training process to refine both detection and interaction predictions. Our model not only predicts the HOI triplets but also quantifies the uncertainty of these predictions. Specifically, we model this uncertainty through the variance of predictions and incorporate it into the optimization objective, allowing the model to adaptively adjust its confidence threshold based on prediction variance. This integration helps in mitigating the adverse effects of incorrect or ambiguous predictions that are common in traditional methods without any hand-designed components, serving as an automatic confidence threshold. Our method is flexible to existing HOI detection methods and demonstrates improved accuracy. We evaluate \textsc{UAHOI} on two standard benchmarks in the field: V-COCO and HICO-DET, which represent challenging scenarios for HOI detection. Through extensive experiments, we demonstrate that \textsc{UAHOI} achieves significant improvements over existing state-of-the-art methods, enhancing both the accuracy and robustness of HOI detection.

UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 5 figures, 5 tables)

This paper contains 17 sections, 8 equations, 5 figures, 5 tables.

Introduction
Related Works
Traditional HOI Detection
End-to-End HOI Detection
Uncertainty Estimation
Mehtods
Preliminary: Vanilla Transformer-based HOI Detection
Uncertainty-aware Instance Localization
Uncertainty-aware Interaction Refinement
Implementation Details
Experimental Results
Results for HICO-DET
Results for V-COCO
Qualitative Results
Ablation Study
...and 2 more sections

Figures (5)

Figure 1: Common challenges of current HOI Detection methods in complex scenes. The human/object bounding boxes are shown in blue/yellow.
Figure 2: Overall framework of our UAHOI. UAHOI consists of three components: Visual Feature Extrator, Parallel Decoder and Uncertainty Estimation module. Visual features are firstly extracted by CNN and shared Transformer Encoder. Then, the Localization Decoder and Interaction Decoder run n parallelto extract human/object bounding boxes and interaction class. Lastly, the proposed Uncertainty-aware Instance Localization and Interaction Refinement modules are used to perform uncertainty regularization.
Figure 3: Visualization results of our UAHOI.
Figure 4: For a more comprehensive validation of the effects of different uncertainty estimation methods, we further designed our interaction prediction tests by incorporating MC dropout (b), as well as utilizing architectures of Fully Feed-Forward Networks (FFN) with varying depths (c). By adding dropout at different layers, we achieved varying degrees of prediction variance. The results, comparisons, and further analyses are presented in Table \ref{['tab:strategy']}.
Figure 5: Visualization results of two failure cases.

UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

TL;DR

Abstract

UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)