A Synchronized Audio-Visual Multi-View Capture System

Xiangwei Shi; Era Dorta Perez; Ruud de Jong; Ojas Shirekar; Chirag Raman

A Synchronized Audio-Visual Multi-View Capture System

Xiangwei Shi, Era Dorta Perez, Ruud de Jong, Ojas Shirekar, Chirag Raman

Abstract

Multi-view capture systems have been an important tool in research for recording human motion under controlling conditions. Most existing systems are specified around video streams and provide little or no support for audio acquisition and rigorous audio-video alignment, despite both being essential for studying conversational interaction where timing at the level of turn-taking, overlap, and prosody matters. In this technical report, we describe an audio-visual multi-view capture system that addresses this gap by treating synchronized audio and synchronized video as first-class signals. The system combines a multi-camera pipeline with multi-channel microphone recording under a unified timing architecture and provides a practical workflow for calibration, acquisition, and quality control that supports repeatable recordings at scale. We quantify synchronization performance in deployment and show that the resulting recordings are temporally consistent enough to support fine-grained analysis and data-driven modeling of conversation behavior.

A Synchronized Audio-Visual Multi-View Capture System

Abstract

Paper Structure (29 sections, 10 figures)

This paper contains 29 sections, 10 figures.

Introduction
Related Work
Foundations of multi-view video capture.
Scaling to multi-person and social interaction.
Toward Integrated Audio-Visual Capture Systems.
System Design
Capture volume frame.
Cameras.
Lens.
Audio subsystem.
Multi-channel wireless microphones.
Audio interface and conversion.
Network connectivity and storage.
Lighting.
Synchronization
...and 14 more sections

Figures (10)

Figure 1: Top and interior views of the modular capture frame. The structure is organized into three horizontal mounting levels, allowing flexible placement of cameras and lighting at arbitrary positions.
Figure 2: Overview of the camera unit. The right figure illustrates some key functionalities and supported synchronization inputs (e.g., LTC, SYNC).
Figure 3: Left: the control computer and microphone rack housing the audio interface and wireless receivers. Right: a close-up of a wearable bodypack transmitter from a single microphone channel.
Figure 4: Left: LED panel. Right: light diffuser.
Figure 5: PTP synchronization among cameras. One master camera distributes its internal clock to the remaining slave cameras. Thus, all cameras have a unified internal clock.
...and 5 more figures

A Synchronized Audio-Visual Multi-View Capture System

Abstract

A Synchronized Audio-Visual Multi-View Capture System

Authors

Abstract

Table of Contents

Figures (10)