M3Act: Learning from Synthetic Human Group Activities

Che-Jui Chang; Danrui Li; Deep Patel; Parth Goel; Honglu Zhou; Seonghyeon Moon; Samuel S. Sohn; Sejong Yoon; Vladimir Pavlovic; Mubbasir Kapadia

M3Act: Learning from Synthetic Human Group Activities

Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

TL;DR

M3Act presents a Unity-based, large-scale synthetic data generator for multi-view, multi-group human activities, producing rich 2D/3D annotations across single-group, multi-group, and group-activity settings. The authors create two datasets, M3ActRGB and M3Act3D, enabled by 25 scenes, 104 HDRIs, 2200 human models, and 384 animations, to support tasks such as multi-person tracking and group activity recognition, with a novel controllable 3D group activity generation (GAG) task. Empirical results show synthetic data substantially improves performance on DanceTrack with MOTRv2, boosts CAD2 GAR with pretraining, and enables competitive 3D group generation via diffusion-based baselines that benefit from inter-person interaction modeling. The work demonstrates synthetic data's potential to reduce real-data needs, enable few-shot transfer, and spur new research directions in controllable social-motion synthesis, while acknowledging domain gaps and asset limitations as avenues for future refinement.

Abstract

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10th to 2nd place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act.

M3Act: Learning from Synthetic Human Group Activities

TL;DR

Abstract

Paper Structure (13 sections, 5 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 13 sections, 5 equations, 7 figures, 12 tables, 1 algorithm.

Introduction
Related Works
M$^{3}$Act
Data Generation
Dataset Statistics
Experiments
Multi-Person Tracking
Group Activity Recognition
Controllable 3D Group Activity Generation
Evaluation
Results
Discussion and Conclusion
Acknowledgement

Figures (7)

Figure 1: M$^{3}$Act is a large-scale synthetic data generator designed to support multi-person and multi-group research topics.M$^{3}$Act features multiple semantic groups and produces highly diverse and photorealistic videos with a rich set of annotations suitable for human-centered tasks including multi-person tracking, group activity recognition, and controllable human group activity generation.
Figure 2: The data generation process of M$^{3}$Act. It consists of multiple data simulations with scene instantiation, group activity authoring, and a data capture module. A high degree of randomization is involved in all aspects of the process to ensure diverse data.
Figure 3: The qualitative comparison of two group activities from ground truth (GT), MDM, and MDM+IFormer. The distribution of the persons from MDM+IFormer is closer to GT.
Figure 4: Distributions of M$^{3}$ActRGB.
Figure 5: Distributions of M$^{3}$Act3D.
...and 2 more figures

M3Act: Learning from Synthetic Human Group Activities

TL;DR

Abstract

M3Act: Learning from Synthetic Human Group Activities

Authors

TL;DR

Abstract

Table of Contents

Figures (7)