BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change
Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger
TL;DR
This paper tackles the lack of datasets for automatic Ambivalence/Hesitancy (A/H) recognition in health contexts by introducing the BAH dataset, a multimodal video collection from 300 Canadian participants answering seven prompts designed to elicit A/H. Each video is annotated by behavioural experts at video- and frame-level, with onset/offset timestamps and cross-modal cues, plus transcripts and participant metadata; data are public with accompanying code and pretrained weights. Baseline experiments assess visual, audio, and text modalities, both individually and in fusion, revealing that temporal context and multimodal integration improve A/H recognition, while zero-shot and domain-adaptive personalization experiments illustrate a path toward personalized digital health interventions. The dataset supports frame- and video-level tasks and enables development of context-aware, interpretable models for automated health coaching and avatar-driven interventions, with potential impact on scalable behavioural change programs. Overall, BAH provides a rigorous resource and initial benchmarks to advance A/H recognition in the wild for digital health.
Abstract
Ambivalence and hesitancy (A/H), a closely related construct, is the primary reasons why individuals delay, avoid, or abandon health behaviour changes. It is a subtle and conflicting emotion that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. It manifests by a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exists for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours captured from 300 participants across Canada answering predefined questions to elicit A/H. It is intended to mirror real-world online personalized behaviour change interventions. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participants' meta-data are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization using source-free domain adaptation. The data, code, and pretrained weights are available.
