Table of Contents
Fetching ...

SMGDiff: Soccer Motion Generation using diffusion probabilistic models

Hongdi Yang, Chengyang Li, Zhenxuan Wu, Gaozheng Li, Jingya Wang, Jingyi Yu, Zhuo Su, Lan Xu

TL;DR

The SMGDiff is introduced, a novel two-stage framework for generating real-time and user-controllable soccer motions that significantly outperforms existing methods in terms of motion quality and condition alignment.

Abstract

Soccer is a globally renowned sport with significant applications in video games and VR/AR. However, generating realistic soccer motions remains challenging due to the intricate interactions between the human player and the ball. In this paper, we introduce SMGDiff, a novel two-stage framework for generating real-time and user-controllable soccer motions. Our key idea is to integrate real-time character control with a powerful diffusion-based generative model, ensuring high-quality and diverse output motion. In the first stage, we instantly transform coarse user controls into diverse global trajectories of the character. In the second stage, we employ a transformer-based autoregressive diffusion model to generate soccer motions based on trajectory conditioning. We further incorporate a contact guidance module during inference to optimize the contact details for realistic ball-foot interactions. Moreover, we contribute a large-scale soccer motion dataset consisting of over 1.08 million frames of diverse soccer motions. Extensive experiments demonstrate that our SMGDiff significantly outperforms existing methods in terms of motion quality and condition alignment.

SMGDiff: Soccer Motion Generation using diffusion probabilistic models

TL;DR

The SMGDiff is introduced, a novel two-stage framework for generating real-time and user-controllable soccer motions that significantly outperforms existing methods in terms of motion quality and condition alignment.

Abstract

Soccer is a globally renowned sport with significant applications in video games and VR/AR. However, generating realistic soccer motions remains challenging due to the intricate interactions between the human player and the ball. In this paper, we introduce SMGDiff, a novel two-stage framework for generating real-time and user-controllable soccer motions. Our key idea is to integrate real-time character control with a powerful diffusion-based generative model, ensuring high-quality and diverse output motion. In the first stage, we instantly transform coarse user controls into diverse global trajectories of the character. In the second stage, we employ a transformer-based autoregressive diffusion model to generate soccer motions based on trajectory conditioning. We further incorporate a contact guidance module during inference to optimize the contact details for realistic ball-foot interactions. Moreover, we contribute a large-scale soccer motion dataset consisting of over 1.08 million frames of diverse soccer motions. Extensive experiments demonstrate that our SMGDiff significantly outperforms existing methods in terms of motion quality and condition alignment.

Paper Structure

This paper contains 14 sections, 15 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Pipeline of SMGDiff. Our framework consists of two stages: In the trajectory generation stage, we transform soccer skill label $\mathbf{S}$, target trajectory point $\mathbf{G}$ from user control, and past trajectory $\mathbf{T}^{\mathcal{P}}$ into refined future trajectory $\mathbf{T}^{\mathcal{F}}$. In the soccer motion generation stage, the soccer motion diffusion model is fed with a noisy motion sequence $\mathbf{X}_{T}^{\mathcal{F}}$ and condition information $\mathbf{C}$, which concatenates $\mathbf{S}$, $\mathbf{T}^{\mathcal{F}}$ and past motion $\mathbf{X}^{\mathcal{P}}$. Contact guidance module refines the predicted soccer motion $\hat{\mathbf{X}}_{0}^{\mathcal{F}}$ during the diffusion process to enhance the contact details.
  • Figure 2: The top section exhibits selected highlights of our dataset. The bottom section features a proportion of different soccer motions. In total, our dataset comprises 2398 sequences and captures approximately 1.08 million frames of data.
  • Figure 3: Qualitative comparison between our method and baseline methods including LMP Starke2020, MANN-DP manndeepphase and CM codebookmatching. The green line represents the trajectories of the hands and feet. The motions generated by the baseline methods exhibit deficiencies in motion quality (such as foot sliding and skill accuracy). Our method significantly surpasses the baseline methods in terms of motion details. More qualitative results can be found in the supplementary video.
  • Figure 4: Qualitative evaluation of the Trajectory Generation Model (TGM). Given identical conditions, TGM enhances motion diversity. The dashed line represents the generated trajectory.
  • Figure 5: Qualitative evaluation of the Contact Guidance Module (CGM). Given identical conditions, CGM effectively prevents instances of missed contact when the ball changes direction. Contact frames represent points where the ball’s trajectory shifts.