Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

Zongnan Ma; Fuchun Zhang; Zhixiong Nan; Yao Ge

Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

Zongnan Ma, Fuchun Zhang, Zhixiong Nan, Yao Ge

TL;DR

A Hierarchical Complete-Recent information fusion model that makes full use of the features of the entire video sequence and the features of the video tail sequence to explore the rich interrelationships between multiscale complete features and multiscale recent features is proposed.

Abstract

Anticipating human intention from videos has broad applications, such as automatic driving, robot assistive technology, and virtual reality. This study addresses the problem of intention action anticipation using egocentric video sequences to estimate actions that indicate human intention. We propose a Hierarchical Complete-Recent (HCR) information fusion model that makes full use of the features of the entire video sequence (i.e., complete features) and the features of the video tail sequence (i.e., recent features). The HCR model has two primary mechanisms. The Guide-Feedback Loop (GFL) mechanism is proposed to model the relation between one recent feature and one complete feature. Based on GFL, the MultiComplete-Recent Feature Aggregation (MCRFA) module is proposed to model the relation of one recent feature with multiscale complete features. Based on GFL and MCRFA, the HCR model can hierarchically explore the rich interrelationships between multiscale complete features and multiscale recent features. Through comparative and ablation experiments, we validate the effectiveness of our model on two well-known public datasets: EPIC-Kitchens and EGTEA Gaze+.

Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

TL;DR

Abstract

Paper Structure (15 sections, 16 equations, 7 figures, 9 tables)

This paper contains 15 sections, 16 equations, 7 figures, 9 tables.

Introduction
Related work
Intention action anticipation
Human-object interaction inference
Method
Feature extraction
Guide-Feedback Loop mechanism
MultiComplete-Recent Feature Aggregation
Hierarchical Complete-Recent Fusion
Experiments
Experiment setup
Comparison experiments
Diagnostic study
Qualitative results
Conclusion

Figures (7)

Figure 1: Visualization of intention action anticipation. Anticipation time $\tau$ is how much in advance the intention action has to be anticipated.
Figure 2: Overview of Guide-Feedback Loop (GFL) mechanism. GFL comprises three stages: 1) complete feature is updated to generate a global guiding feature ($\boldsymbol{GGF}$); 2) global guiding feature guides the recent feature; 3) guided recent feature feeds back to the updated complete feature.
Figure 3: Overview of MultiComplete-Recent Feature Aggregation (MCRFA) module. The MCRFA module models one recent feature with multiple complete features.
Figure 4: Hierarchical complete-recent fusion prediction. 'V' represents a verb, 'N' denotes a verb, and 'A' expresses an action.
Figure 5: Attention visualization of the complete and recent features. Red boxes indicate ground truth regions. Initial represents the initial features. Single Attentioned denotes the results for which only a single self-attention mechanism is used (i.e., Equation \ref{['k1']} converts to $\boldsymbol{K_1}= Soft(Conv(\boldsymbol{C}_1))*\boldsymbol{C}_1)$). Dual Attentioned denotes the results generated by the dual self-attention mechanism (Equation \ref{['k1']}). Guided denotes the results of recent features guided by $\boldsymbol{GGF}$ (Equation \ref{['k2']})
...and 2 more figures

Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

TL;DR

Abstract

Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

Authors

TL;DR

Abstract

Table of Contents

Figures (7)