Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

Si-Ioi Ng; Pranav S. Ambadi; Kimberly D. Mueller; Julie Liss; Visar Berisha

Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

Si-Ioi Ng, Pranav S. Ambadi, Kimberly D. Mueller, Julie Liss, Visar Berisha

TL;DR

This work targets automated spatio-semantic analysis in picture description to assess cognitive impairment by extracting and ordering CIUs from the Cookie Theft image. It presents a BERT-based pipeline fine-tuned with a multi-task loss $L=(1-\\lambda)L_{BCE}+\\lambda L_{rank}$, with $\\lambda=0.1$ and margin $m=1$, to detect 23 CIUs and preserve narrative order. Across 5-fold cross-validation, the approach achieves a median CIU precision of 93% and a median recall of 96%, with a sequence error rate of 24% and stronger external correlations with ground-truth spatio-semantic features than a dictionary-based baseline. Clinical validation via ANCOVA shows that features derived from BERT CIUs perform comparably to manually annotated CIUs for distinguishing healthy versus cognitively impaired groups, and the method is open-sourced for broad use.

Abstract

Current methods for automated assessment of cognitive-linguistic impairment via picture description often neglect the visual narrative path - the sequence and locations of elements a speaker described in the picture. Analyses of spatio-semantic features capture this path using content information units (CIUs), but manual tagging or dictionary-based mapping is labor-intensive. This study proposes a BERT-based pipeline, fine tuned with binary cross-entropy and pairwise ranking loss, for automated CIU extraction and ordering from the Cookie Theft picture description. Evaluated by 5-fold cross-validation, it achieves 93% median precision, 96% median recall in CIU detection, and 24% sequence error rates. The proposed method extracts features that exhibit strong Pearson correlations with ground truth, surpassing the dictionary-based baseline in external validation. These features also perform comparably to those derived from manual annotations in evaluating group differences via ANCOVA. The pipeline is shown to effectively characterize visual narrative paths for cognitive impairment assessment, with the implementation and models open-sourced to public.

Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

TL;DR

Abstract

Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)