Table of Contents
Fetching ...

Interpretable Recognition of Cognitive Distortions in Natural Language Texts

Anton Kolonin, Anna Arinicheva

TL;DR

The paper tackles automated recognition of cognitive distortions in therapeutic texts while prioritizing interpretability and efficiency. It introduces an interpretable multiclass classifier built on heterarchical N-grams and a convolution-based recognition algorithm that respects pattern hierarchy. The approach achieves state-of-the-art F1 scores on two public datasets and releases open-source code and models for community use, demonstrating real-time applicability. The work advances practical AI-assisted psychological care by providing auditable linguistic patterns and a path toward broader linguistic generalization and dataset expansion.

Abstract

We propose a new approach to multi-factor classification of natural language texts based on weighted structured patterns such as N-grams, taking into account the heterarchical relationships between them, applied to solve such a socially impactful problem as the automation of detection of specific cognitive distortions in psychological care, relying on an interpretable, robust and transparent artificial intelligence model. The proposed recognition and learning algorithms improve the current state of the art in this field. The improvement is tested on two publicly available datasets, with significant improvements over literature-known F1 scores for the task, with optimal hyper-parameters determined, having code and models available for future use by the community.

Interpretable Recognition of Cognitive Distortions in Natural Language Texts

TL;DR

The paper tackles automated recognition of cognitive distortions in therapeutic texts while prioritizing interpretability and efficiency. It introduces an interpretable multiclass classifier built on heterarchical N-grams and a convolution-based recognition algorithm that respects pattern hierarchy. The approach achieves state-of-the-art F1 scores on two public datasets and releases open-source code and models for community use, demonstrating real-time applicability. The work advances practical AI-assisted psychological care by providing auditable linguistic patterns and a path toward broader linguistic generalization and dataset expansion.

Abstract

We propose a new approach to multi-factor classification of natural language texts based on weighted structured patterns such as N-grams, taking into account the heterarchical relationships between them, applied to solve such a socially impactful problem as the automation of detection of specific cognitive distortions in psychological care, relying on an interpretable, robust and transparent artificial intelligence model. The proposed recognition and learning algorithms improve the current state of the art in this field. The improvement is tested on two publicly available datasets, with significant improvements over literature-known F1 scores for the task, with optimal hyper-parameters determined, having code and models available for future use by the community.

Paper Structure

This paper contains 15 sections, 11 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of explainability of distortion recognition by extracting and highlighting text segments corresponding to the distorted part based on patterns revealed in the interpretable model provided by Bollen2021 — our research code at the top, output with highlighting at the bottom.
  • Figure 2: Comparison of our results on real field data shreevastava-foltz-2021-dataset using unweighted ($F1$=0.46) and weighted ($F1$=0.47) recognition with the baselines obtained in article_1469178 ($F1$=0.45) and based on the earlier model Arinicheva2025 ($F1$=0.28), with error bars for three independent train/test splits.
  • Figure 3: Comparison of our results on the combined semi-synthetic data halil_2024 using unweighted ($F1$=0.9) and weighted ($F1$=0.89) recognition versus the baselines obtained in article_1469178 ($F1$=0.77) and based on the earlier model Arinicheva2025 ($F1$=0.28), with error bars for three independent train/test splits.
  • Figure 4: Comparison based on the field dataset shreevastava-foltz-2021-dataset with $F1$ scores obtained for certain distortions comparing our results (green) with the baseline presented in article_1469178 (blue) and based on the earlier model Arinicheva2025 (orange).