Table of Contents
Fetching ...

Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish

Aylin Gunal, Bowen Yi, John Piette, Rada Mihalcea, Verónica Pérez-Rosas

TL;DR

This paper presents MIDAS, the first publicly available Spanish Motivational Interviewing dataset, created from public video sources and annotated for counselor questions and reflections using the MITI ITEM scheme. It analyzes cross-language differences in MI strategies by comparing Spanish and English conversations with LIWC-based language-use and sentiment analyses, and benchmarks monolingual and multilingual classifiers (e.g., BERT variants) for predicting counselor behaviors. Key contributions include a large-scale Spanish MI resource, language-specific insights into conversational dynamics, and baseline classification results illustrating the value of language-aligned data for MI coding. The work enables more accurate NLP tools for Spanish-speaking mental health contexts and highlights the importance of language-specific data in psychotherapy research, with the dataset publicly available for future work.

Abstract

Cultural and language factors significantly influence counseling, but Natural Language Processing research has not yet examined whether the findings of conversational analysis for counseling conducted in English apply to other languages. This paper presents a first step towards this direction. We introduce MIDAS (Motivational Interviewing Dataset in Spanish), a counseling dataset created from public video sources that contains expert annotations for counseling reflections and questions. Using this dataset, we explore language-based differences in counselor behavior in English and Spanish and develop classifiers in monolingual and multilingual settings, demonstrating its applications in counselor behavioral coding tasks.

Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish

TL;DR

This paper presents MIDAS, the first publicly available Spanish Motivational Interviewing dataset, created from public video sources and annotated for counselor questions and reflections using the MITI ITEM scheme. It analyzes cross-language differences in MI strategies by comparing Spanish and English conversations with LIWC-based language-use and sentiment analyses, and benchmarks monolingual and multilingual classifiers (e.g., BERT variants) for predicting counselor behaviors. Key contributions include a large-scale Spanish MI resource, language-specific insights into conversational dynamics, and baseline classification results illustrating the value of language-aligned data for MI coding. The work enables more accurate NLP tools for Spanish-speaking mental health contexts and highlights the importance of language-specific data in psychotherapy research, with the dataset publicly available for future work.

Abstract

Cultural and language factors significantly influence counseling, but Natural Language Processing research has not yet examined whether the findings of conversational analysis for counseling conducted in English apply to other languages. This paper presents a first step towards this direction. We introduce MIDAS (Motivational Interviewing Dataset in Spanish), a counseling dataset created from public video sources that contains expert annotations for counseling reflections and questions. Using this dataset, we explore language-based differences in counselor behavior in English and Spanish and develop classifiers in monolingual and multilingual settings, demonstrating its applications in counselor behavioral coding tasks.

Paper Structure

This paper contains 10 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Mean word exchange rates across Spanish and English conversations.
  • Figure 2: Counselor sentiment across languages