Table of Contents
Fetching ...

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger

TL;DR

This paper tackles the lack of datasets for automatic Ambivalence/Hesitancy (A/H) recognition in health contexts by introducing the BAH dataset, a multimodal video collection from 300 Canadian participants answering seven prompts designed to elicit A/H. Each video is annotated by behavioural experts at video- and frame-level, with onset/offset timestamps and cross-modal cues, plus transcripts and participant metadata; data are public with accompanying code and pretrained weights. Baseline experiments assess visual, audio, and text modalities, both individually and in fusion, revealing that temporal context and multimodal integration improve A/H recognition, while zero-shot and domain-adaptive personalization experiments illustrate a path toward personalized digital health interventions. The dataset supports frame- and video-level tasks and enables development of context-aware, interpretable models for automated health coaching and avatar-driven interventions, with potential impact on scalable behavioural change programs. Overall, BAH provides a rigorous resource and initial benchmarks to advance A/H recognition in the wild for digital health.

Abstract

Ambivalence and hesitancy (A/H), a closely related construct, is the primary reasons why individuals delay, avoid, or abandon health behaviour changes. It is a subtle and conflicting emotion that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. It manifests by a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exists for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours captured from 300 participants across Canada answering predefined questions to elicit A/H. It is intended to mirror real-world online personalized behaviour change interventions. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participants' meta-data are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization using source-free domain adaptation. The data, code, and pretrained weights are available.

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

TL;DR

This paper tackles the lack of datasets for automatic Ambivalence/Hesitancy (A/H) recognition in health contexts by introducing the BAH dataset, a multimodal video collection from 300 Canadian participants answering seven prompts designed to elicit A/H. Each video is annotated by behavioural experts at video- and frame-level, with onset/offset timestamps and cross-modal cues, plus transcripts and participant metadata; data are public with accompanying code and pretrained weights. Baseline experiments assess visual, audio, and text modalities, both individually and in fusion, revealing that temporal context and multimodal integration improve A/H recognition, while zero-shot and domain-adaptive personalization experiments illustrate a path toward personalized digital health interventions. The dataset supports frame- and video-level tasks and enables development of context-aware, interpretable models for automated health coaching and avatar-driven interventions, with potential impact on scalable behavioural change programs. Overall, BAH provides a rigorous resource and initial benchmarks to advance A/H recognition in the wild for digital health.

Abstract

Ambivalence and hesitancy (A/H), a closely related construct, is the primary reasons why individuals delay, avoid, or abandon health behaviour changes. It is a subtle and conflicting emotion that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. It manifests by a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exists for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours captured from 300 participants across Canada answering predefined questions to elicit A/H. It is intended to mirror real-world online personalized behaviour change interventions. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participants' meta-data are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization using source-free domain adaptation. The data, code, and pretrained weights are available.

Paper Structure

This paper contains 33 sections, 21 figures, 24 tables.

Figures (21)

  • Figure 1: Examples of body language cues used by annotators to identify the occurrence of A/H: "looking away," and "changing posture."
  • Figure 2: BAH dataset collection and annotation procedure. First, a participant access our web platform. They go through initial test/calibration to ensure the quality of the data. An avatar guides them throughout the entire process. Seven questions are presented to the participant. They are recorded while answering them. Once the data is captured, it is transferred by the Administrator to our local server. It is then annotated at several levels by an expert to determine when A/H occurs.
  • Figure 3: Multimodal model used to produce baseline performance richet-abaw-24.
  • Figure 4: File structure of the shared BAH dataset.
  • Figure 5: Examples taken from the platform to present our "Automatic Expression Recognition" (AER) web-based platform (https://www.aerstudy.ca/).
  • ...and 16 more figures