Table of Contents
Fetching ...

A new training approach for text classification in Mental Health: LatentGLoss

Korhan Sevinç

TL;DR

The paper tackles mental health text classification from user generated text using a multi stage evaluation that spans traditional ML to transformer models. The core contribution is DualLatentGNet, a teacher-student architecture where the teacher provides latent representations and logits that are modeled with a Gaussian Mixture Model and used to guide the student via a latentG loss in addition to standard supervised losses. The key findings show that LatentGLoss plus latent distribution alignment yields state of the art performance on a curated seven class mental health dataset, surpassing BERT variants while offering efficiency benefits. This approach has practical implications for scalable, accessible mental health monitoring and early intervention in real world deployments.

Abstract

This study presents a multi-stage approach to mental health classification by leveraging traditional machine learning algorithms, deep learning architectures, and transformer-based models. A novel data set was curated and utilized to evaluate the performance of various methods, starting with conventional classifiers and advancing through neural networks. To broaden the architectural scope, recurrent neural networks (RNNs) such as LSTM and GRU were also evaluated to explore their effectiveness in modeling sequential patterns in the data. Subsequently, transformer models such as BERT were fine-tuned to assess the impact of contextual embeddings in this domain. Beyond these baseline evaluations, the core contribution of this study lies in a novel training strategy involving a dual-model architecture composed of a teacher and a student network. Unlike standard distillation techniques, this method does not rely on soft label transfer; instead, it facilitates information flow through both the teacher model's output and its latent representations by modifying the loss function. The experimental results highlight the effectiveness of each modeling stage and demonstrate that the proposed loss function and teacher-student interaction significantly enhance the model's learning capacity in mental health prediction tasks.

A new training approach for text classification in Mental Health: LatentGLoss

TL;DR

The paper tackles mental health text classification from user generated text using a multi stage evaluation that spans traditional ML to transformer models. The core contribution is DualLatentGNet, a teacher-student architecture where the teacher provides latent representations and logits that are modeled with a Gaussian Mixture Model and used to guide the student via a latentG loss in addition to standard supervised losses. The key findings show that LatentGLoss plus latent distribution alignment yields state of the art performance on a curated seven class mental health dataset, surpassing BERT variants while offering efficiency benefits. This approach has practical implications for scalable, accessible mental health monitoring and early intervention in real world deployments.

Abstract

This study presents a multi-stage approach to mental health classification by leveraging traditional machine learning algorithms, deep learning architectures, and transformer-based models. A novel data set was curated and utilized to evaluate the performance of various methods, starting with conventional classifiers and advancing through neural networks. To broaden the architectural scope, recurrent neural networks (RNNs) such as LSTM and GRU were also evaluated to explore their effectiveness in modeling sequential patterns in the data. Subsequently, transformer models such as BERT were fine-tuned to assess the impact of contextual embeddings in this domain. Beyond these baseline evaluations, the core contribution of this study lies in a novel training strategy involving a dual-model architecture composed of a teacher and a student network. Unlike standard distillation techniques, this method does not rely on soft label transfer; instead, it facilitates information flow through both the teacher model's output and its latent representations by modifying the loss function. The experimental results highlight the effectiveness of each modeling stage and demonstrate that the proposed loss function and teacher-student interaction significantly enhance the model's learning capacity in mental health prediction tasks.

Paper Structure

This paper contains 15 sections, 5 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Distribution of classes in the final dataset.
  • Figure 2: Text Length distribution of the final dataset.
  • Figure 3: Dual Architecture Diagram