Synergy and Synchrony in Couple Dances

Vongani Maluleke; Lea Müller; Jathushan Rajasegaran; Georgios Pavlakos; Shiry Ginosar; Angjoo Kanazawa; Jitendra Malik

Synergy and Synchrony in Couple Dances

Vongani Maluleke, Lea Müller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, Jitendra Malik

TL;DR

The paper investigates how social interaction influences future human motion by studying a dyadic coupling scenario in Swing dance. It introduces a discretized, factorized motion representation using three separate VQ-VAE codebooks for pose, orientation, and translation, and a transformer-based autoregressive predictor that operates on codebook indices. By comparing unary and dyadic prediction, the work demonstrates that conditioning on a partner’s motion yields more realistic, diverse, and synchronized predictions, and it provides an in-the-wild Swing dataset with 3D pseudo-ground-truth motion to enable further research. Overall, the results show that social context substantially improves future motion prediction in close human interactions, with implications for socially aware motion synthesis and analysis.

Abstract

This paper asks to what extent social interaction influences one's behavior. We study this in the setting of two dancers dancing as a couple. We first consider a baseline in which we predict a dancer's future moves conditioned only on their past motion without regard to their partner. We then investigate the advantage of taking social information into account by conditioning also on the motion of their dancing partner. We focus our analysis on Swing, a dance genre with tight physical coupling for which we present an in-the-wild video dataset. We demonstrate that single-person future motion prediction in this context is challenging. Instead, we observe that prediction greatly benefits from considering the interaction partners' behavior, resulting in surprisingly compelling couple dance synthesis results (see supp. video). Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction. Video results are available on the project website: https://von31.github.io/synNsync

Synergy and Synchrony in Couple Dances

TL;DR

Abstract

Paper Structure (15 sections, 15 equations, 6 figures, 3 tables)

This paper contains 15 sections, 15 equations, 6 figures, 3 tables.

Introduction
Related Work
Motion Prediction in Couple Dance
Problem Definitions
Learning Quantized Motion Codebooks
Conditional Autoregressive Prediction
Implementation Details
3D Human Couple Dancing Dataset
Evaluation
VQ-VAE Codebook Ablation
Quantitative Results
Qualitative Results
Limitations
Conclusion
Acknowledgements

Figures (6)

Figure 1: To what extent does Bob's behavior affect Alice's behavior? We study this question in a couple's dance - an example of full-body dyadic physical social interaction. We predict the full body motion of a dancer, Alice (orange), given their own past motion (gray) and their partner, Bob's (blue), motion.
Figure 2: Test-time autoregressive prediction in the unary (left) and dyadic (right) tasks. In the unary prediction task, we start from Alice's past motion up to time $t_\pi$, and predict the next time step in each iteration. In the dyadic prediction task, we start from Alice and Bob's motion up to time $t_\pi$. In each step, we predict Alice's next token from her past ground truth motion until $t_\pi$, her predicted motion for $t > t_\pi$, and Bob's past ground truth motion
Figure 3: Illustration of the VQ-VAEs. We learn three separate codebooks, one for each body model parameter. An encoder, $E_{\cdot}$, maps the body parameter to the codebook, $\mathcal{Z}_{\cdot}$. The decoder, $D_{\cdot}$ brings the codebook latent vectors back into body model parameter space. To obtain 3D meshes, we jointly pass the parameters through the body model function.
Figure 4: Transformer training procedure in the dyadic case. The major part of our network is a transformer-decoder block with causal masking, such that Alice and Bob can only attend to their past motion. Input to our model are Alice and Bob's codebook indices for body pose, $\Theta$, orientation, $\Phi$, and translation, $\Gamma$. We embed the tokens into a latent space and add time, person, and parameter encoding. The final layer in our network generates probability scores over codebook indices, representing the likelihood of an index being the next motion $\mathbf{s}_{t_{\pi}+1}$.
Figure 5: Prediction error grows over time, more so for unary prediction. Graphs of the MPJPE and PA-MPJPE metrics over time computed for ours vs. baselines (in comparison with ground truth) starting from $t=t_\pi$, the point in which we start predicting. While our predicted future motion is correct initially in both conditions, prediction error grows over time faster in the unary than in the dyadic case. This increase is expected since motion during physical interaction highly depends on one's interaction partner. In contrast, all baselines start from a high error at $t_\pi$.
...and 1 more figures

Synergy and Synchrony in Couple Dances

TL;DR

Abstract

Synergy and Synchrony in Couple Dances

Authors

TL;DR

Abstract

Table of Contents

Figures (6)