Table of Contents
Fetching ...

A Computational Approach to Analyzing Disrupted Language in Schizophrenia: Integrating Surprisal and Coherence Measures

Gowtham Premananth, Carol Espy-Wilson

TL;DR

This study addresses objective linguistic markers of schizophrenia by quantifying spontaneous speech using surprisal and semantic coherence. Surprisal is computed as $Surprisal(w_i) = -\log P(w_i|w_1,...,w_{i-1})$ via GPT-2, and semantic coherence is measured with both LDA-based topic modeling and BERT-based embeddings. The study finds that schizophrenia speakers exhibit slightly higher average surprisal and slightly lower semantic coherence than healthy controls, with distinct surprisal-coherence relationships across groups and a positive association between surprisal and BPRS severity. The work highlights the potential of computational language metrics as biomarkers, while acknowledging limitations due to a small, mildly affected sample and calling for larger, more diverse datasets in future work.

Abstract

Language disruptions are one of the well-known effects of schizophrenia symptoms. They are often manifested as disorganized speech and impaired discourse coherence. These abnormalities in spontaneous language production reflect underlying cognitive disturbances and have the potential to serve as objective markers for symptom severity and diagnosis of schizophrenia. This study focuses on how these language disruptions can be characterized in terms of two computational linguistic measures: surprisal and semantic coherence. By computing surprisal and semantic coherence of language using computational models, this study investigates how they differ between subjects with schizophrenia and healthy controls. Furthermore, this study provides further insight into how language disruptions in terms of these linguistic measures change with varying degrees of schizophrenia symptom severity.

A Computational Approach to Analyzing Disrupted Language in Schizophrenia: Integrating Surprisal and Coherence Measures

TL;DR

This study addresses objective linguistic markers of schizophrenia by quantifying spontaneous speech using surprisal and semantic coherence. Surprisal is computed as via GPT-2, and semantic coherence is measured with both LDA-based topic modeling and BERT-based embeddings. The study finds that schizophrenia speakers exhibit slightly higher average surprisal and slightly lower semantic coherence than healthy controls, with distinct surprisal-coherence relationships across groups and a positive association between surprisal and BPRS severity. The work highlights the potential of computational language metrics as biomarkers, while acknowledging limitations due to a small, mildly affected sample and calling for larger, more diverse datasets in future work.

Abstract

Language disruptions are one of the well-known effects of schizophrenia symptoms. They are often manifested as disorganized speech and impaired discourse coherence. These abnormalities in spontaneous language production reflect underlying cognitive disturbances and have the potential to serve as objective markers for symptom severity and diagnosis of schizophrenia. This study focuses on how these language disruptions can be characterized in terms of two computational linguistic measures: surprisal and semantic coherence. By computing surprisal and semantic coherence of language using computational models, this study investigates how they differ between subjects with schizophrenia and healthy controls. Furthermore, this study provides further insight into how language disruptions in terms of these linguistic measures change with varying degrees of schizophrenia symptom severity.

Paper Structure

This paper contains 7 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: BERT-based semantic coherence and LDA-based semantic coherence
  • Figure 2: Surprisal vs semantic coherence
  • Figure 3: Average surprisal in relation to symptom severity based on BPRS scores
  • Figure 4: Semantic coherence based on BERT-embeddings in relation to symptom severity based on BPRS scores