Table of Contents
Fetching ...

Social Processes: Probabilistic Meta-learning for Adaptive Multiparty Interaction Forecasting

Augustinas Jučas, Chirag Raman

TL;DR

This work tackles group-level social cue forecasting by formalizing Social Cue Forecasting (SCF) and introducing Social Processes (SP) models that treat each conversational group as a meta-learning task. SP extends Neural Processes with a Seq2Seq encoder–decoder that jointly predicts futures for all group members, incorporating both self and partner dynamics and allowing non-contiguous observation/future windows; the Attentive SP (ASP) variant adds cross-attention to condition predictions on context more precisely. Through synthetic experiments, the authors show SP models yield well-calibrated distributions over futures and reveal meaningful latent structure, while generalization hinges on the diversity of social behaviors seen during training. This framework lays a foundation for adaptive, group-aware interaction forecasting with potential applications in social robotics and human–AI collaboration, while highlighting ethical considerations and the need for diverse training data to support robust extrapolation.

Abstract

Adaptively forecasting human behavior in social settings is an important step toward achieving Artificial General Intelligence. Most existing research in social forecasting has focused either on unfocused interactions, such as pedestrian trajectory prediction, or on monadic and dyadic behavior forecasting. In contrast, social psychology emphasizes the importance of group interactions for understanding complex social dynamics. This creates a gap that we address in this paper: forecasting social interactions at the group (conversation) level. Additionally, it is important for a forecasting model to be able to adapt to groups unseen at train time, as even the same individual behaves differently across different groups. This highlights the need for a forecasting model to explicitly account for each group's unique dynamics. To achieve this, we adopt a meta-learning approach to human behavior forecasting, treating every group as a separate meta-learning task. As a result, our method conditions its predictions on the specific behaviors within the group, leading to generalization to unseen groups. Specifically, we introduce Social Process (SP) models, which predict a distribution over future multimodal cues jointly for all group members based on their preceding low-level multimodal cues, while incorporating other past sequences of the same group's interactions. In this work we also analyze the generalization capabilities of SP models in both their outputs and latent spaces through the use of realistic synthetic datasets.

Social Processes: Probabilistic Meta-learning for Adaptive Multiparty Interaction Forecasting

TL;DR

This work tackles group-level social cue forecasting by formalizing Social Cue Forecasting (SCF) and introducing Social Processes (SP) models that treat each conversational group as a meta-learning task. SP extends Neural Processes with a Seq2Seq encoder–decoder that jointly predicts futures for all group members, incorporating both self and partner dynamics and allowing non-contiguous observation/future windows; the Attentive SP (ASP) variant adds cross-attention to condition predictions on context more precisely. Through synthetic experiments, the authors show SP models yield well-calibrated distributions over futures and reveal meaningful latent structure, while generalization hinges on the diversity of social behaviors seen during training. This framework lays a foundation for adaptive, group-aware interaction forecasting with potential applications in social robotics and human–AI collaboration, while highlighting ethical considerations and the need for diverse training data to support robust extrapolation.

Abstract

Adaptively forecasting human behavior in social settings is an important step toward achieving Artificial General Intelligence. Most existing research in social forecasting has focused either on unfocused interactions, such as pedestrian trajectory prediction, or on monadic and dyadic behavior forecasting. In contrast, social psychology emphasizes the importance of group interactions for understanding complex social dynamics. This creates a gap that we address in this paper: forecasting social interactions at the group (conversation) level. Additionally, it is important for a forecasting model to be able to adapt to groups unseen at train time, as even the same individual behaves differently across different groups. This highlights the need for a forecasting model to explicitly account for each group's unique dynamics. To achieve this, we adopt a meta-learning approach to human behavior forecasting, treating every group as a separate meta-learning task. As a result, our method conditions its predictions on the specific behaviors within the group, leading to generalization to unseen groups. Specifically, we introduce Social Process (SP) models, which predict a distribution over future multimodal cues jointly for all group members based on their preceding low-level multimodal cues, while incorporating other past sequences of the same group's interactions. In this work we also analyze the generalization capabilities of SP models in both their outputs and latent spaces through the use of realistic synthetic datasets.
Paper Structure (14 sections, 12 equations, 12 figures, 1 table)

This paper contains 14 sections, 12 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Illustration of the two forecasting approaches on a real-world situation from the MatchNMingle dataset cabrera2018matchnmingle. The top part of the figure illustrates a high-order group leaving event vandoornRitualsLeavingPredictive2018, where the individual leaves from one group (in $\bm{t}_\mathrm{obs}$) to another ($\bm{t}_\mathrm{fut}$). The bottom part depicts the low-level social cues $\bm{b}^i_t$: head pose (solid normal), body pose (hollow normal), and speaking status (speaker in orange), which are used as features for predictions. In the case of top-down approach (a), the goal is to predict the group leaving label, therefore in this case, from 90 minutes of interaction, only 200 samples can be generated vandoornRitualsLeavingPredictive2018. However, in the case of our proposed bottom-up, self-supervised formulation of Social Cue Forecasting(b), the task is to predict the future low-level cues. This allows to make use of all 90 minutes of data.
  • Figure 2: Architecture of the SP and ASP family.
  • Figure 3: Encoding partner behavior for conversation participant $\mathrm{p}^0$ for a single timestep. To model the influence partners $\mathrm{p}^1$ and $\mathrm{p}^2$ have on the behavior of $\mathrm{p}^0$, we transform the partner features to capture the interaction from $\mathrm{p}^0$'s perspective, and learn a representation of these features invariant to group size and partner-order permutation using the symmetric $\mathrm{max}$ function.
  • Figure 4: Ground truths and predictions for the mixed context glancing behavior task. All models learn to average over the possible futures. Our SP models learn a better fit than the NP model, SP-GRU being the best (see zoomed insets).
  • Figure 5: Mean per timestep LL over the sequences in the synthetic glancing mixed context dataset. Higher is better.
  • ...and 7 more figures