Table of Contents
Fetching ...

Robot-Led Vision Language Model Wellbeing Assessment of Children

Nida Itrat Abbasi, Fethiye Irmak Dogan, Guy Laban, Joanna Anderson, Tamsin Ford, Peter B. Jones, Hatice Gunes

TL;DR

This work investigates a robot-led wellbeing assessment for children using a Vision Language Model (VLM) guided by Child Apperception Test (CAT) stimuli. By having a NAO robot present pictorial prompts and analyzing children’s narratives with a VLM, the study compares automated judgments to those of a trained psychologist, examining agreement, consistency, and sensitivity to gender and age. Results show moderate reliability in identifying cases with no wellbeing concerns but limited accuracy for clinical concerns, with a notable false-positive bias for girls. The findings underscore both the promise and the challenges of deploying VLM-enabled robots in pediatric wellbeing assessments, highlighting the need for bias mitigation, cautious interpretation, and supplementary human oversight.

Abstract

This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systematically compared to those provided by a trained psychologist. The results reveal that while the VLM demonstrates moderate reliability in identifying cases with no wellbeing concerns, its ability to accurately classify assessments with clinical concern remains limited. Moreover, although the model's performance was generally consistent when prompted with varying demographic factors such as age and gender, a significantly higher false positive rate was observed for girls, indicating potential sensitivity to gender attribute. These findings highlight both the promise and the challenges of integrating VLMs into robot-led assessments of children's wellbeing.

Robot-Led Vision Language Model Wellbeing Assessment of Children

TL;DR

This work investigates a robot-led wellbeing assessment for children using a Vision Language Model (VLM) guided by Child Apperception Test (CAT) stimuli. By having a NAO robot present pictorial prompts and analyzing children’s narratives with a VLM, the study compares automated judgments to those of a trained psychologist, examining agreement, consistency, and sensitivity to gender and age. Results show moderate reliability in identifying cases with no wellbeing concerns but limited accuracy for clinical concerns, with a notable false-positive bias for girls. The findings underscore both the promise and the challenges of deploying VLM-enabled robots in pediatric wellbeing assessments, highlighting the need for bias mitigation, cautious interpretation, and supplementary human oversight.

Abstract

This study presents a novel robot-led approach to assessing children's mental wellbeing using a Vision Language Model (VLM). Inspired by the Child Apperception Test (CAT), the social robot NAO presented children with pictorial stimuli to elicit their verbal narratives of the images, which were then evaluated by a VLM in accordance with CAT assessment guidelines. The VLM's assessments were systematically compared to those provided by a trained psychologist. The results reveal that while the VLM demonstrates moderate reliability in identifying cases with no wellbeing concerns, its ability to accurately classify assessments with clinical concern remains limited. Moreover, although the model's performance was generally consistent when prompted with varying demographic factors such as age and gender, a significantly higher false positive rate was observed for girls, indicating potential sensitivity to gender attribute. These findings highlight both the promise and the challenges of integrating VLMs into robot-led assessments of children's wellbeing.

Paper Structure

This paper contains 13 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: (a) The CRI data collection setup, (b) the CAT cards bellak1949children shown to the children and provided to the VLM (cartoonized by OpenAI Sora for displaying in the paper), (c) the VLM prompt, and (d) the VLM output.