Table of Contents
Fetching ...

Generating Emotive Gaits for Virtual Agents Using Affect-Based Autoregression

Uttaran Bhattacharya, Nicholas Rewkowski, Pooja Guhan, Niall L. Williams, Trisha Mittal, Aniket Bera, Dinesh Manocha

TL;DR

A novel autoregression network is presented to generate virtual agents that convey various emotions through their walking styles or gaits so that the virtual agents can express and transition between emotions represented as combinations of happy, sad, angry, and neutral.

Abstract

We present a novel autoregression network to generate virtual agents that convey various emotions through their walking styles or gaits. Given the 3D pose sequences of a gait, our network extracts pertinent movement features and affective features from the gait. We use these features to synthesize subsequent gaits such that the virtual agents can express and transition between emotions represented as combinations of happy, sad, angry, and neutral. We incorporate multiple regularizations in the training of our network to simultaneously enforce plausible movements and noticeable emotions on the virtual agents. We also integrate our approach with an AR environment using a Microsoft HoloLens and can generate emotive gaits at interactive rates to increase the social presence. We evaluate how human observers perceive both the naturalness and the emotions from the generated gaits of the virtual agents in a web-based study. Our results indicate around 89% of the users found the naturalness of the gaits satisfactory on a five-point Likert scale, and the emotions they perceived from the virtual agents are statistically similar to the intended emotions of the virtual agents. We also use our network to augment existing gait datasets with emotive gaits and will release this augmented dataset for future research in emotion prediction and emotive gait synthesis. Our project website is available at https://gamma.umd.edu/gen_emotive_gaits/.

Generating Emotive Gaits for Virtual Agents Using Affect-Based Autoregression

TL;DR

A novel autoregression network is presented to generate virtual agents that convey various emotions through their walking styles or gaits so that the virtual agents can express and transition between emotions represented as combinations of happy, sad, angry, and neutral.

Abstract

We present a novel autoregression network to generate virtual agents that convey various emotions through their walking styles or gaits. Given the 3D pose sequences of a gait, our network extracts pertinent movement features and affective features from the gait. We use these features to synthesize subsequent gaits such that the virtual agents can express and transition between emotions represented as combinations of happy, sad, angry, and neutral. We incorporate multiple regularizations in the training of our network to simultaneously enforce plausible movements and noticeable emotions on the virtual agents. We also integrate our approach with an AR environment using a Microsoft HoloLens and can generate emotive gaits at interactive rates to increase the social presence. We evaluate how human observers perceive both the naturalness and the emotions from the generated gaits of the virtual agents in a web-based study. Our results indicate around 89% of the users found the naturalness of the gaits satisfactory on a five-point Likert scale, and the emotions they perceived from the virtual agents are statistically similar to the intended emotions of the virtual agents. We also use our network to augment existing gait datasets with emotive gaits and will release this augmented dataset for future research in emotion prediction and emotive gait synthesis. Our project website is available at https://gamma.umd.edu/gen_emotive_gaits/.

Paper Structure

This paper contains 49 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our Affective features: The top left figure shows our pose graph as a directed tree, with the joints numbered $0$ through $20$. We use $18$ affective features, counting $11$ joint angles, $4$ distance ratios, and $3$ area ratios. The joint angles are labeled $A_1$ through $A_{11}$, and marked with red arcs on the last three figures in the top row. The leftmost figure on the bottom row shows the distances we use to compute the distance ratios. We use the ratios $\frac{D_1}{D_2}$, $\frac{D_3}{D_4}$, $\frac{D_6}{D_5}$, and $\frac{D_7}{D_5}$. The last three figures on the bottom row show the triangles we use to compute area ratios. We use the ratios $\frac{T_1}{T_2}$, $\frac{T_3}{T_4}$, $\frac{T_5}{T_6}$. These features are used by our network to generate emotive gaits of the virtual agents.
  • Figure 2: Movement features: We show the root height from the ground $h^t$, the root speed $s^t$, and the stepping phase $\theta^t$. The root speed is the distance travelled between time steps $t$ and $t-1$. The stepping phase $\theta^t=0$ when the left foot touches the ground at time step $t$, $\theta^{t + \Delta t}=\pi$ when the right foot touches the ground at time step $t + \Delta t$, and $\theta^{t + \Delta t + \tau}=0$ when the left foot touches the ground again. We fill in the values for $\theta^t$ between these time steps using linear interpolation. We use these features in our autoregression network.
  • Figure 3: Our autoregression network for emotive gaits: Our network takes in the joint rotations, input emotions as vectors consisting of probabilities for happy, sad, angry, and neutral, pose affective features, and movement features and jointly maps them to a latent representation space through the encoder. The predictor then takes in the latent representations and predicts gaits for subsequent time steps that follow the input trajectory while expressing the input emotions. The green boxes denote concatenation, and the cyan box at the end of the predictor denotes normalization of the variables to versors.
  • Figure 4: Emotional expressions and transitions. Each row shows four snapshots of synthesized gaits in temporal sequence from left to right. The top two rows show gaits with single emotions. The bottom row shows gaits transitioning from one emotion to another.
  • Figure 5: Comparison and Ablation Studies.$\left(a\right)$ and $\left(c\right)$ shows emotive four snapshots in temporal sequence from left to right gaits generated by our network following user-driven trajectories. $\left(b\right)$ shows the results at the same four time instances for QuaterNet, which has no emotive component. $\left(d\right)$ shows the results at the same four time instances for our network without the affective feature component. In this case, the gait is able to follow the trajectory, but not express the emotions (e.g., no shoulder slouching to indicate sadness). $\left(e\right)$ shows the results at the same four time instances for our network without the movement feature component. In this case, the gait is able to express emotions, but not follow the desired trajectory.
  • ...and 3 more figures