mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis
Dae-young Kim, Rebecca Hwa, Muhammad Mahbubur Rahman
TL;DR
The paper addresses the challenge of building effective mental health NLP systems under limited computing resources by introducing mhGPT, a 1.98B-parameter transformer trained on a fusion of expert (PubMed) and lay (Reddit) mental-health data. It combines a custom tokenizer, sliding-window data sampling, and parameter-efficient fine-tuning with LoRA and NEFTune, including 4-bit quantization to maximize efficiency. mhGPT is shown to outperform at least one larger, social-media-trained model (MentaLLaMA) and match or exceed the performance of MentalBERT and MentalRoBERTa on several downstream tasks, with NEFTune further boosting performance on imbalanced data. These results demonstrate that expert-knowledge-infused, smaller LLMs can deliver strong mental health text analysis in low-resource settings, enabling broader and more accessible AI-enabled mental health support while highlighting areas for future validation and interpretability.
Abstract
This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT outperformed larger models and matched the performance of models trained on significantly more data. The key contributions include integrating diverse mental health data, creating a custom tokenizer, and optimizing a smaller architecture for low-resource settings. This research could advance AI-driven mental health care, especially in areas with limited computing power.
