Table of Contents
Fetching ...

1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction

Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Hyein Hwang, Soohyun Hwang, Junuk Cha, Jaewook Han, Seungryul Baek

TL;DR

This report describes the 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates.

Abstract

This report describes our 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024. In this challenge, we address the task of bimanual category-agnostic hand-object interaction reconstruction, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates. This task is particularly challenging due to the significant occlusion and dynamic contact between the hands and the object during bimanual manipulation. We worked to resolve these issues by introducing a mask loss and a 3D contact loss, respectively. Moreover, we applied 3D Gaussian Splatting (3DGS) to this task. As a result, our method achieved a value of 38.69 in the main metric, CD$_h$, on the ARCTIC test set.

1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction

TL;DR

This report describes the 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates.

Abstract

This report describes our 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024. In this challenge, we address the task of bimanual category-agnostic hand-object interaction reconstruction, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates. This task is particularly challenging due to the significant occlusion and dynamic contact between the hands and the object during bimanual manipulation. We worked to resolve these issues by introducing a mask loss and a 3D contact loss, respectively. Moreover, we applied 3D Gaussian Splatting (3DGS) to this task. As a result, our method achieved a value of 38.69 in the main metric, CD, on the ARCTIC test set.
Paper Structure (7 sections, 9 equations, 4 figures, 1 table)

This paper contains 7 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Limitation of the HOLD baseline. From the original camera viewpoint, HOLD performs well on 2D contact reconstruction. However, it performs poorly in 3D contact reconstruction when seen from different camera viewpoints. As a result, it fails to accurately estimate the relative distance between the hand and the object, which worsens the main metric, CD$_h$.
  • Figure 2: Our method is composed of 'Single Train' and 'Joint Train' stages. In 'Single Train' stage, appearances and geometries for left and right hands and the object are reconstructed by fitting 3D Gaussian splats on each agent. In 'Joint Train' stage, we further consider contacts between the hands and the object and refine obtained Gaussian splats.
  • Figure 3: Qualitative results. For each example, the first row visualizes a result in the camera view and the second row visualizes a result in the side view. We can observe that our method provides better alignment between the hand and the object in the side view.
  • Figure 4: Ablation study. (1) $L_{contact}$ encourages contact between the hand and the object in the 3D space. (2)When $m$ is used instead of $\bar{m}$, the Gaussian is destroyed due to self-occlusion.