Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum

Morteza Rohanian; Roya M. Hüppi; Farhad Nooralahzadeh; Noemi Dannecker; Yves Pauli; Werner Surbeck; Iris Sommer; Wolfram Hinzen; Nicolas Langer; Michael Krauthammer; Philipp Homan

Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum

Morteza Rohanian, Roya M. Hüppi, Farhad Nooralahzadeh, Noemi Dannecker, Yves Pauli, Werner Surbeck, Iris Sommer, Wolfram Hinzen, Nicolas Langer, Michael Krauthammer, Philipp Homan

TL;DR

This study addresses robust detection of psychosis-related speech disruptions across the continuum by incorporating data- and model-uncertainty into a multimodal analysis. It introduces an uncertainty-aware Temporal Context Fusion (TCF) that models unimodal latent distributions $h_i^M \sim \mathcal{N}(\mu_i^M, \Sigma_i^M)$, derives fusion weights from inverse variances $w_i^A$ and $w_i^T$, and optimizes a calibration-ordinality loss $L_{CO}$ to align uncertainties with errors. The approach fuses traditional and deep acoustic features ($eGeMAPS$, DEEPSPECTRUM, wav2vec 2.0) with text embeddings from PELICAN/XLM-RoBERTa, evaluated on 114 German-speaking participants including early psychosis and schizotypy groups, achieving $ECE = 4.5 \times 10^{-2}$ and $F1 = 0.83$. Results show improved prediction accuracy and reliability across structured, semi-structured, and narrative tasks, with strong cross-context generalization, suggesting practical utility for early detection and personalized assessment within the psychosis spectrum.

Abstract

Capturing subtle speech disruptions across the psychosis spectrum is challenging because of the inherent variability in speech patterns. This variability reflects individual differences and the fluctuating nature of symptoms in both clinical and non-clinical populations. Accounting for uncertainty in speech data is essential for predicting symptom severity and improving diagnostic precision. Speech disruptions characteristic of psychosis appear across the spectrum, including in non-clinical individuals. We develop an uncertainty-aware model integrating acoustic and linguistic features to predict symptom severity and psychosis-related traits. Quantifying uncertainty in specific modalities allows the model to address speech variability, improving prediction accuracy. We analyzed speech data from 114 participants, including 32 individuals with early psychosis and 82 with low or high schizotypy, collected through structured interviews, semi-structured autobiographical tasks, and narrative-driven interactions in German. The model improved prediction accuracy, reducing RMSE and achieving an F1-score of 83% with ECE = 4.5e-2, showing robust performance across different interaction contexts. Uncertainty estimation improved model interpretability by identifying reliability differences in speech markers such as pitch variability, fluency disruptions, and spectral instability. The model dynamically adjusted to task structures, weighting acoustic features more in structured settings and linguistic features in unstructured contexts. This approach strengthens early detection, personalized assessment, and clinical decision-making in psychosis-spectrum research.

Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum

TL;DR

Abstract

Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)