Multi-Granularity Hand Action Detection
Ting Zhe, Jing Zhang, Yongqian Li, Yong Luo, Han Hu, Dacheng Tao
TL;DR
This paper tackles the challenge of fine-grained hand action detection in kitchen videos by introducing FHA-Kitchens, a large-scale dataset with both coarse and fine-grained hand action labels and precise localization for hand interaction regions. It also presents MG-HAD, an end-to-end DETR-based detector that handles multi-granularity action information via Multi-dimensional Action Queries and Coarse-Fine Contrastive Denoising, improving performance on both coarse and fine-grained labels. The authors provide extensive dataset statistics, a three-track benchmark (SL-AD, SL-AR, DG), and thorough ablations showing the effectiveness of the proposed designs. Overall, the work establishes a valuable new dataset and a strong baseline for multi-granularity hand action detection with potential impact on HCI, robotics, and video understanding tasks.
Abstract
Detecting hand actions in videos is crucial for understanding video content and has diverse real-world applications. Existing approaches often focus on whole-body actions or coarse-grained action categories, lacking fine-grained hand-action localization information. To fill this gap, we introduce the FHA-Kitchens (Fine-Grained Hand Actions in Kitchen Scenes) dataset, providing both coarse- and fine-grained hand action categories along with localization annotations. This dataset comprises 2,377 video clips and 30,047 frames, annotated with approximately 200k bounding boxes and 880 action categories. Evaluation of existing action detection methods on FHA-Kitchens reveals varying generalization capabilities across different granularities. To handle multi-granularity in hand actions, we propose MG-HAD, an End-to-End Multi-Granularity Hand Action Detection method. It incorporates two new designs: Multi-dimensional Action Queries and Coarse-Fine Contrastive Denoising. Extensive experiments demonstrate MG-HAD's effectiveness for multi-granularity hand action detection, highlighting the significance of FHA-Kitchens for future research and real-world applications. The dataset and source code are available at https://github.com/superZ678/MG-HAD.
