Table of Contents
Fetching ...

Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts

Yuxin Zhu, Yuting Guo, Noah Marchuck, Abeed Sarker, Yun Wang

TL;DR

The paper addresses the challenge of ADHD diagnosis by leveraging narrative transcripts and a hybrid AI approach. It combines LLaMA3 prompting, RoBERTa classification, and TF-IDF–SVM with engineered features into a majority-voting ensemble, evaluated on 441 youths from the Healthy Brain Network with a 60/20/20 train/dev/test split, achieving a test $F_1$ of $0.71$ and recall of $0.91$, outperforming individual models. The contributions include a carefully designed prompt for the LLM, a fine-tuned transformer classifier, feature-rich text representations for SVM, and an ensemble framework that capitalizes on model diversity. This work demonstrates the potential of hybrid architectures to enhance psychiatric text classification and points to future directions involving larger, more diverse datasets and more sophisticated ensembling and interpretability methods. The findings offer a promising path toward objective, narrative-based aids for psychiatric assessment and decision support, contingent on broader validation.

Abstract

Despite rapid advances in large language models (LLMs), their integration with traditional supervised machine learning (ML) techniques that have proven applicability to medical data remains underexplored. This is particularly true for psychiatric applications, where narrative data often exhibit nuanced linguistic and contextual complexity, and can benefit from the combination of multiple models with differing characteristics. In this study, we introduce an ensemble framework for automatically classifying Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis (binary) using narrative transcripts. Our approach integrates three complementary models: LLaMA3, an open-source LLM that captures long-range semantic structure; RoBERTa, a pre-trained transformer model fine-tuned on labeled clinical narratives; and a Support Vector Machine (SVM) classifier trained using TF-IDF-based lexical features. These models are aggregated through a majority voting mechanism to enhance predictive robustness. The dataset includes 441 instances, including 352 for training and 89 for validation. Empirical results show that the ensemble outperforms individual models, achieving an F$_1$ score of 0.71 (95\% CI: [0.60-0.80]). Compared to the best-performing individual model (SVM), the ensemble improved recall while maintaining competitive precision. This indicates the strong sensitivity of the ensemble in identifying ADHD-related linguistic cues. These findings demonstrate the promise of hybrid architectures that leverage the semantic richness of LLMs alongside the interpretability and pattern recognition capabilities of traditional supervised ML, offering a new direction for robust and generalizable psychiatric text classification.

Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts

TL;DR

The paper addresses the challenge of ADHD diagnosis by leveraging narrative transcripts and a hybrid AI approach. It combines LLaMA3 prompting, RoBERTa classification, and TF-IDF–SVM with engineered features into a majority-voting ensemble, evaluated on 441 youths from the Healthy Brain Network with a 60/20/20 train/dev/test split, achieving a test of and recall of , outperforming individual models. The contributions include a carefully designed prompt for the LLM, a fine-tuned transformer classifier, feature-rich text representations for SVM, and an ensemble framework that capitalizes on model diversity. This work demonstrates the potential of hybrid architectures to enhance psychiatric text classification and points to future directions involving larger, more diverse datasets and more sophisticated ensembling and interpretability methods. The findings offer a promising path toward objective, narrative-based aids for psychiatric assessment and decision support, contingent on broader validation.

Abstract

Despite rapid advances in large language models (LLMs), their integration with traditional supervised machine learning (ML) techniques that have proven applicability to medical data remains underexplored. This is particularly true for psychiatric applications, where narrative data often exhibit nuanced linguistic and contextual complexity, and can benefit from the combination of multiple models with differing characteristics. In this study, we introduce an ensemble framework for automatically classifying Attention-Deficit/Hyperactivity Disorder (ADHD) diagnosis (binary) using narrative transcripts. Our approach integrates three complementary models: LLaMA3, an open-source LLM that captures long-range semantic structure; RoBERTa, a pre-trained transformer model fine-tuned on labeled clinical narratives; and a Support Vector Machine (SVM) classifier trained using TF-IDF-based lexical features. These models are aggregated through a majority voting mechanism to enhance predictive robustness. The dataset includes 441 instances, including 352 for training and 89 for validation. Empirical results show that the ensemble outperforms individual models, achieving an F score of 0.71 (95\% CI: [0.60-0.80]). Compared to the best-performing individual model (SVM), the ensemble improved recall while maintaining competitive precision. This indicates the strong sensitivity of the ensemble in identifying ADHD-related linguistic cues. These findings demonstrate the promise of hybrid architectures that leverage the semantic richness of LLMs alongside the interpretability and pattern recognition capabilities of traditional supervised ML, offering a new direction for robust and generalizable psychiatric text classification.

Paper Structure

This paper contains 13 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: End-to-end ensemble-based narrative classification pipeline. Narrative transcripts elicited immediately after fMRI scanning (“post-scan interviews”) undergo preprocessing, including tokenization, TF-IDF vectorization, engineered feature extraction (e.g., response length, question length), and isolation of interviewee texts, before being fed into three complementary classifiers: (1) LLaMA3 via optimized prompt engineering; (2) a fine-tuned RoBERTa transformer; and (3) a support vector machine leveraging both TF-IDF lexical features and additional engineered metrics. Each model independently generates an ADHD vs. non-ADHD prediction, and these are combined under a majority-voting rule (narratives are labeled ADHD if at least two models concur) to produce the final diagnostic classification. Directed arrows denote the flow of data through each module, illustrating the modular architecture of the ensemble framework.
  • Figure 2: Example of narrative data from a post-scan interview illustrating ADHD-related response patterns. The participant’s answers are fragmented and lack coherence, with shifts in focus and expressions of confusion (e.g., "I don't know"). These behaviors—difficulty staying on topic and providing detailed responses—are characteristic of ADHD and serve as key indicators in the classification process.
  • Figure 3: The full final prompt used in this study.
  • Figure 4: The confusion matrices for each individual model and the ensemble model.