MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Anna Deichler; Jim O'Regan; Jonas Beskow

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Anna Deichler, Jim O'Regan, Jonas Beskow

TL;DR

A novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR) to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings.

Abstract

In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

TL;DR

Abstract

Paper Structure (17 sections, 2 figures, 1 table)

This paper contains 17 sections, 2 figures, 1 table.

Introduction
Related Work
Co-Speech Gesture Generation
Human-Scene Interaction Datasets
Dataset Description
Data Collection
Data Annotation
Conclusions
Appendix
Scenarios
General Setup:
Scenario 1
Scenario 2
Scenario 3
Object distribution
...and 2 more sections

Figures (2)

Figure 1: (a) Illustrations of the hardware setup as described in Appendix \ref{['app:hardware']}. The participants wear a motion capture suit with gloves motion tracking and a VR headset with gaze tracking in the experimental setup. (b) Top-down view of one of the scenes used in the experiments.
Figure 2: Object category distribution across simulated environments used in the experiments.

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

TL;DR

Abstract

MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Authors

TL;DR

Abstract

Table of Contents

Figures (2)