Table of Contents
Fetching ...

Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction

Philipp Wicke

TL;DR

The paper investigates how Large Language Models (LLMs) understand gestures described in text and how this affects human–AI interaction. It proposes a gesture-centric methodology using the Verbal Message List (VML) to build a dataset that pairs textual prompts with gesture descriptions, and employs Turning/Turing Experiments to evaluate open-source LLMs. The study emphasizes cross-cultural gesture interpretation and outlines a plan to extend to multimodal models, aiming to establish a dataset and empirical framework for gesture semantics in AI. This work can inform the design of more contextually aware conversational agents and advance gesture-based reasoning in human–AI systems.

Abstract

The rise of Large Language Models (LLMs) has affected various disciplines that got beyond mere text generation. Going beyond their textual nature, this project proposal aims to investigate the interaction between LLMs and non-verbal communication, specifically focusing on gestures. The proposal sets out a plan to examine the proficiency of LLMs in deciphering both explicit and implicit non-verbal cues within textual prompts and their ability to associate these gestures with various contextual factors. The research proposes to test established psycholinguistic study designs to construct a comprehensive dataset that pairs textual prompts with detailed gesture descriptions, encompassing diverse regional variations, and semantic labels. To assess LLMs' comprehension of gestures, experiments are planned, evaluating their ability to simulate human behaviour in order to replicate psycholinguistic experiments. These experiments consider cultural dimensions and measure the agreement between LLM-identified gestures and the dataset, shedding light on the models' contextual interpretation of non-verbal cues (e.g. gestures).

Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction

TL;DR

The paper investigates how Large Language Models (LLMs) understand gestures described in text and how this affects human–AI interaction. It proposes a gesture-centric methodology using the Verbal Message List (VML) to build a dataset that pairs textual prompts with gesture descriptions, and employs Turning/Turing Experiments to evaluate open-source LLMs. The study emphasizes cross-cultural gesture interpretation and outlines a plan to extend to multimodal models, aiming to establish a dataset and empirical framework for gesture semantics in AI. This work can inform the design of more contextually aware conversational agents and advance gesture-based reasoning in human–AI systems.

Abstract

The rise of Large Language Models (LLMs) has affected various disciplines that got beyond mere text generation. Going beyond their textual nature, this project proposal aims to investigate the interaction between LLMs and non-verbal communication, specifically focusing on gestures. The proposal sets out a plan to examine the proficiency of LLMs in deciphering both explicit and implicit non-verbal cues within textual prompts and their ability to associate these gestures with various contextual factors. The research proposes to test established psycholinguistic study designs to construct a comprehensive dataset that pairs textual prompts with detailed gesture descriptions, encompassing diverse regional variations, and semantic labels. To assess LLMs' comprehension of gestures, experiments are planned, evaluating their ability to simulate human behaviour in order to replicate psycholinguistic experiments. These experiments consider cultural dimensions and measure the agreement between LLM-identified gestures and the dataset, shedding light on the models' contextual interpretation of non-verbal cues (e.g. gestures).
Paper Structure (13 sections, 2 figures)

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: Probing a Large Language Model (LLM) through the input of gesture descriptions can serve as a valuable means to evaluate its understanding of gestures, contributing to the refinement of human-AI interaction.
  • Figure 2: Suggested Turing Experiment (TE) based on the VLM list. The item from the VLM list (e.g. Stop) is turned into an appropriate prompt for the TE, which is then fed to the language model (e.g. Llama-2) for evaluation.