Table of Contents
Fetching ...

OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

Jens Albrecht, Robert Lehmann, Aleksandra Poltermann, Eric Rudolph, Philipp Steigerwald, Mara Stieler

TL;DR

OnCoCo 1.0 addresses the lack of public, fine-grained online counseling data by introducing a bilingual, to-be-annotated dataset with 38 counselor and 28 client categories. It extends MI-based schemes by integrating diverse counseling approaches and provides robust baselines using multilingual transformers, achieving competitive performance. The work demonstrates the dataset's applicability for quality assurance, education, issue detection, chatbot evaluation, and resource allocation, and makes resources publicly available. This contributes a scalable, interpretable resource for social-work and mental-health dialogue analysis with potential to improve online counseling practice and research.

Abstract

This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we developed a comprehensive new coding scheme that differentiates between 38 types of counselor and 28 types of client utterances, and created a labeled dataset consisting of about 2.800 messages from counseling conversations. We fine-tuned several models on our dataset to demonstrate its applicability. The data and models are publicly available to researchers and practitioners. Thus, our work contributes a new type of fine-grained conversational resource to the language resources community, extending existing datasets for social and mental-health dialogue analysis.

OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

TL;DR

OnCoCo 1.0 addresses the lack of public, fine-grained online counseling data by introducing a bilingual, to-be-annotated dataset with 38 counselor and 28 client categories. It extends MI-based schemes by integrating diverse counseling approaches and provides robust baselines using multilingual transformers, achieving competitive performance. The work demonstrates the dataset's applicability for quality assurance, education, issue detection, chatbot evaluation, and resource allocation, and makes resources publicly available. This contributes a scalable, interpretable resource for social-work and mental-health dialogue analysis with potential to improve online counseling practice and research.

Abstract

This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we developed a comprehensive new coding scheme that differentiates between 38 types of counselor and 28 types of client utterances, and created a labeled dataset consisting of about 2.800 messages from counseling conversations. We fine-tuned several models on our dataset to demonstrate its applicability. The data and models are publicly available to researchers and practitioners. Thus, our work contributes a new type of fine-grained conversational resource to the language resources community, extending existing datasets for social and mental-health dialogue analysis.

Paper Structure

This paper contains 33 sections, 4 tables.