Table of Contents
Fetching ...

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Anna Deichler, Jim O'Regan, Jonas Beskow

TL;DR

A novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR) to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings.

Abstract

In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

TL;DR

A novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR) to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings.

Abstract

In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.
Paper Structure (17 sections, 2 figures, 1 table)

This paper contains 17 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: (a) Illustrations of the hardware setup as described in Appendix \ref{['app:hardware']}. The participants wear a motion capture suit with gloves motion tracking and a VR headset with gaze tracking in the experimental setup. (b) Top-down view of one of the scenes used in the experiments.
  • Figure 2: Object category distribution across simulated environments used in the experiments.