Table of Contents
Fetching ...

A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task

Laerdon Kim

TL;DR

This work tackles automated self-state identification (adaptive vs maladaptive) in mental health Reddit data for CLPsych 2025 Task A.1. It compares a sentence-level Baseline approach with a context-enhanced variant and an LLM-based span-identification method using a 4-bit quantized Gemma 2 9B model, all without fine-tuning. The results show the Baseline with context yields the best overall recall and balanced performance, while span-based methods improve maladaptive recall at the expense of adaptive recall; an adaptive-boost variant partially recovers adaptive signals. The study highlights how data preprocessing, sentence granularity, and prompting choices shape LM performance on nuanced affective content and points toward future hybrid approaches that leverage the strengths of both strategies.

Abstract

We present a baseline for the CLPsych 2025 A.1 task: classifying self-states in mental health data taken from Reddit. We use few-shot learning with a 4-bit quantized Gemma 2 9B model and a data preprocessing step which first identifies relevant sentences indicating self-state evidence, and then performs a binary classification to determine whether the sentence is evidence of an adaptive or maladaptive self-state. This system outperforms our other method which relies on an LLM to highlight spans of variable length independently. We attribute the performance of our model to the benefits of this sentence chunking step for two reasons: partitioning posts into sentences 1) broadly matches the granularity at which self-states were human-annotated and 2) simplifies the task for our language model to a binary classification problem. Our system places third out of fourteen systems submitted for Task A.1, achieving a test-time recall of 0.579.

A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task

TL;DR

This work tackles automated self-state identification (adaptive vs maladaptive) in mental health Reddit data for CLPsych 2025 Task A.1. It compares a sentence-level Baseline approach with a context-enhanced variant and an LLM-based span-identification method using a 4-bit quantized Gemma 2 9B model, all without fine-tuning. The results show the Baseline with context yields the best overall recall and balanced performance, while span-based methods improve maladaptive recall at the expense of adaptive recall; an adaptive-boost variant partially recovers adaptive signals. The study highlights how data preprocessing, sentence granularity, and prompting choices shape LM performance on nuanced affective content and points toward future hybrid approaches that leverage the strengths of both strategies.

Abstract

We present a baseline for the CLPsych 2025 A.1 task: classifying self-states in mental health data taken from Reddit. We use few-shot learning with a 4-bit quantized Gemma 2 9B model and a data preprocessing step which first identifies relevant sentences indicating self-state evidence, and then performs a binary classification to determine whether the sentence is evidence of an adaptive or maladaptive self-state. This system outperforms our other method which relies on an LLM to highlight spans of variable length independently. We attribute the performance of our model to the benefits of this sentence chunking step for two reasons: partitioning posts into sentences 1) broadly matches the granularity at which self-states were human-annotated and 2) simplifies the task for our language model to a binary classification problem. Our system places third out of fourteen systems submitted for Task A.1, achieving a test-time recall of 0.579.

Paper Structure

This paper contains 19 sections, 1 table.