Table of Contents
Fetching ...

Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

Mo Yu, Qiujing Wang, Shunchi Zhang, Yisi Sang, Kangsheng Pu, Zekai Wei, Han Wang, Liyan Xu, Jing Li, Yue Yu, Jie Zhou

TL;DR

This work introduces ToM-in-AMC, a first dataset to assess machines' meta-learning of theory-of-mind (ToM) in realistic narratives by framing each movie as a few-shot character-understanding task. It benchmarks transductive and inductive baselines, including a novel ToM prompting method (ToMPro) that models multiple ToM dimensions, and conducts a large human study for ground-truth comparison. Results show humans significantly outperform AI baselines, with ToMPro achieving the best inductive performance yet still lagging human accuracy by roughly 20%, underscoring the gap in current ToM capabilities. The findings highlight the importance of multi-dimension ToM reasoning for narrative understanding and point to future directions for making AI systems more adept at meta-learning of social mind states in real-world, non-synthetic settings.

Abstract

When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset, ToM-in-AMC, the first assessment of machines' meta-learning of ToM in a realistic narrative understanding scenario. Our dataset consists of ~1,000 parsed movie scripts, each corresponding to a few-shot character understanding task that requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. We propose a novel ToM prompting approach designed to explicitly assess the influence of multiple ToM dimensions. It surpasses existing baseline models, underscoring the significance of modeling multiple ToM dimensions for our task. Our extensive human study verifies that humans are capable of solving our problem by inferring characters' mental states based on their previously seen movies. In comparison, our systems based on either state-of-the-art large language models (GPT-4) or meta-learning algorithms lags >20% behind, highlighting a notable limitation in existing approaches' ToM capabilities.

Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

TL;DR

This work introduces ToM-in-AMC, a first dataset to assess machines' meta-learning of theory-of-mind (ToM) in realistic narratives by framing each movie as a few-shot character-understanding task. It benchmarks transductive and inductive baselines, including a novel ToM prompting method (ToMPro) that models multiple ToM dimensions, and conducts a large human study for ground-truth comparison. Results show humans significantly outperform AI baselines, with ToMPro achieving the best inductive performance yet still lagging human accuracy by roughly 20%, underscoring the gap in current ToM capabilities. The findings highlight the importance of multi-dimension ToM reasoning for narrative understanding and point to future directions for making AI systems more adept at meta-learning of social mind states in real-world, non-synthetic settings.

Abstract

When reading a story, humans can quickly understand new fictional characters with a few observations, mainly by drawing analogies to fictional and real people they already know. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP dataset, ToM-in-AMC, the first assessment of machines' meta-learning of ToM in a realistic narrative understanding scenario. Our dataset consists of ~1,000 parsed movie scripts, each corresponding to a few-shot character understanding task that requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. We propose a novel ToM prompting approach designed to explicitly assess the influence of multiple ToM dimensions. It surpasses existing baseline models, underscoring the significance of modeling multiple ToM dimensions for our task. Our extensive human study verifies that humans are capable of solving our problem by inferring characters' mental states based on their previously seen movies. In comparison, our systems based on either state-of-the-art large language models (GPT-4) or meta-learning algorithms lags >20% behind, highlighting a notable limitation in existing approaches' ToM capabilities.
Paper Structure (42 sections, 8 equations, 19 figures, 11 tables)

This paper contains 42 sections, 8 equations, 19 figures, 11 tables.

Figures (19)

  • Figure 1: Overview of our ToM-in-AMC task and the proposed meta-learning formulation.
  • Figure 2: Our two proposed meta-learning approaches for the character prediction task. (top) the base learner (Longformer-P); (middle) the prototypical network approach; (right) the LEOPARD approach.
  • Figure 3: Our proposed ToMPro approach. The method first (a) generates character mental descriptions along multiple ToM dimensions based on input scenes; then (b) predicts the identities of a new testing scene with the generated descriptions.
  • Figure 4: Ablation of ToMPro on the 5 ToM dimensions.
  • Figure 5: Performance by difficulty levels measured the number of speakers in a scene.
  • ...and 14 more figures