Table of Contents
Fetching ...

Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation

Hoyun Song, Huije Lee, Jisu Shin, Sukmin Cho, Changgeon Ko, Jong C. Park

TL;DR

This work tackles how the quality of rationale explanations from large language models influences knowledge distillation into smaller models for mental health detection. It introduces Selective Knowledge Distillation (SD), which evaluates teacher-generated rationales against domain knowledge (DSM-5 criteria) and trains small models only on high-quality, domain-aligned rationales. Across multiple teacher-student pairs and CoT prompts on Reddit depression data, SD consistently improves detection accuracy and the clinical quality of generated rationales, with strong correlations between rationale quality and performance. The findings underscore the value of thoughtful data curation in distillation and offer a practical framework for interpretable, domain-specific mental health detection and explanation generation, with potential applicability to other clinically-guided tasks.

Abstract

The detection of mental health problems from social media and the interpretation of these results have been extensively explored. Research has shown that incorporating clinical symptom information into a model enhances domain expertise, improving its detection and interpretation performance. While large language models (LLMs) are shown to be effective for generating explanatory rationales in mental health detection, their substantially large parameter size and high computational cost limit their practicality. Reasoning distillation transfers this ability to smaller language models (SLMs), but inconsistencies in the relevance and domain alignment of LLM-generated rationales pose a challenge. This paper investigates how rationale quality impacts SLM performance in mental health detection and explanation generation. We hypothesize that ensuring high-quality and domain-relevant rationales enhances the distillation. To this end, we propose a framework that selects rationales based on their alignment with expert clinical reasoning. Experiments show that our quality-focused approach significantly enhances SLM performance in both mental disorder detection and rationale generation. This work highlights the importance of rationale quality and offers an insightful framework for knowledge transfer in mental health applications.

Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation

TL;DR

This work tackles how the quality of rationale explanations from large language models influences knowledge distillation into smaller models for mental health detection. It introduces Selective Knowledge Distillation (SD), which evaluates teacher-generated rationales against domain knowledge (DSM-5 criteria) and trains small models only on high-quality, domain-aligned rationales. Across multiple teacher-student pairs and CoT prompts on Reddit depression data, SD consistently improves detection accuracy and the clinical quality of generated rationales, with strong correlations between rationale quality and performance. The findings underscore the value of thoughtful data curation in distillation and offer a practical framework for interpretable, domain-specific mental health detection and explanation generation, with potential applicability to other clinically-guided tasks.

Abstract

The detection of mental health problems from social media and the interpretation of these results have been extensively explored. Research has shown that incorporating clinical symptom information into a model enhances domain expertise, improving its detection and interpretation performance. While large language models (LLMs) are shown to be effective for generating explanatory rationales in mental health detection, their substantially large parameter size and high computational cost limit their practicality. Reasoning distillation transfers this ability to smaller language models (SLMs), but inconsistencies in the relevance and domain alignment of LLM-generated rationales pose a challenge. This paper investigates how rationale quality impacts SLM performance in mental health detection and explanation generation. We hypothesize that ensuring high-quality and domain-relevant rationales enhances the distillation. To this end, we propose a framework that selects rationales based on their alignment with expert clinical reasoning. Experiments show that our quality-focused approach significantly enhances SLM performance in both mental disorder detection and rationale generation. This work highlights the importance of rationale quality and offers an insightful framework for knowledge transfer in mental health applications.

Paper Structure

This paper contains 40 sections, 3 equations, 5 figures, 17 tables.

Figures (5)

  • Figure 1: Illustration of varying rationale quality. R1 effectively connects the social media post to specific symptoms in the DSM-5 criteria for major depressive disorder, demonstrating high relevance. R2 lacks these connections, showing low relevance. These examples were generated by GPT-3.5.
  • Figure 2: Overview of our proposed framework for selective reasoning distillation. Unlike standard reasoning distillation, our framework involves generating various rationales for each post, assessing their quality based on relevance to domain knowledge, and selecting the highest-quality rationale for distillation.
  • Figure 3: Correlation between the quality of teacher-generated rationales and the detection performance of student models. Lines connect the performance of the same student model with and without selective distillation. Markers indicate different student models, while colors indicate different teacher models.
  • Figure 4: Distribution of semantic similarity scores between teacher-generated rationales and DSM-5 diagnostic criteria symptom descriptions for depression. We utilized BERTScore to measure the similarity scores and standard CoT prompts to generate rationales. The histograms in each panel, colored in red and blue, represent rationales generated without and with our proposed quality-based selection method, respectively.
  • Figure 5: Ablation study on different selection criteria. We utilized standard CoT prompts for this experiment. Each bar represents the detection accuracy on the test dataset of the corresponding student model trained with the corresponding teacher.