Table of Contents
Fetching ...

mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis

Dae-young Kim, Rebecca Hwa, Muhammad Mahbubur Rahman

TL;DR

The paper addresses the challenge of building effective mental health NLP systems under limited computing resources by introducing mhGPT, a 1.98B-parameter transformer trained on a fusion of expert (PubMed) and lay (Reddit) mental-health data. It combines a custom tokenizer, sliding-window data sampling, and parameter-efficient fine-tuning with LoRA and NEFTune, including 4-bit quantization to maximize efficiency. mhGPT is shown to outperform at least one larger, social-media-trained model (MentaLLaMA) and match or exceed the performance of MentalBERT and MentalRoBERTa on several downstream tasks, with NEFTune further boosting performance on imbalanced data. These results demonstrate that expert-knowledge-infused, smaller LLMs can deliver strong mental health text analysis in low-resource settings, enabling broader and more accessible AI-enabled mental health support while highlighting areas for future validation and interpretability.

Abstract

This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power.

mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis

TL;DR

The paper addresses the challenge of building effective mental health NLP systems under limited computing resources by introducing mhGPT, a 1.98B-parameter transformer trained on a fusion of expert (PubMed) and lay (Reddit) mental-health data. It combines a custom tokenizer, sliding-window data sampling, and parameter-efficient fine-tuning with LoRA and NEFTune, including 4-bit quantization to maximize efficiency. mhGPT is shown to outperform at least one larger, social-media-trained model (MentaLLaMA) and match or exceed the performance of MentalBERT and MentalRoBERTa on several downstream tasks, with NEFTune further boosting performance on imbalanced data. These results demonstrate that expert-knowledge-infused, smaller LLMs can deliver strong mental health text analysis in low-resource settings, enabling broader and more accessible AI-enabled mental health support while highlighting areas for future validation and interpretability.

Abstract

This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power.
Paper Structure (25 sections, 4 figures, 2 tables)

This paper contains 25 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: mhGPT Overview
  • Figure 2: LLM training progress comparison.
  • Figure 3: MultiWD dataset class distribution.
  • Figure 4: mhGPT-1.98B fine-tuning comparison with baseline model Google Gemma-2B.