MedAI Dialog Corpus (MEDIC): Zero-Shot Classification of Doctor and AI Responses in Health Consultations
Olumide E. Ojo, Olaronke O. Adebanji, Alexander Gelbukh, Hiram Calvo, Anna Feldman
TL;DR
This paper investigates whether zero-shot transformer models can distinguish doctor- versus AI-generated text in health consultations without corpus-specific training. It introduces a healthcare text corpus comprising doctor responses, ChatGPT-generated responses, and rephrased doctor responses across three datasets (DC, DR, DCR) and evaluates BART, BERT, XLM, XLM-R, and DistilBERT on binary and multiclass tasks. Results show that, while the models demonstrate general language understanding, their zero-shot accuracy is limited for this domain, with best performance around F1 ≈ 0.58 on DC and ≈ 0.52 on DR and much lower on DCR, suggesting a need for corpus-informed training or few-shot approaches. The work provides a valuable dataset and baseline that guide future research toward more robust text attribution in medical contexts and informs the design of trustworthy AI systems in healthcare.
Abstract
Zero-shot classification enables text to be classified into classes not seen during training. In this study, we examine the efficacy of zero-shot learning models in classifying healthcare consultation responses from Doctors and AI systems. The models evaluated include BART, BERT, XLM, XLM-R and DistilBERT. The models were tested on three different datasets based on a binary and multi-label analysis to identify the origins of text in health consultations without any prior corpus training. According to our findings, the zero-shot language models show a good understanding of language generally, but has limitations when trying to classify doctor and AI responses to healthcare consultations. This research provides a foundation for future research in the field of medical text classification by informing the development of more accurate methods of classifying text written by Doctors and AI systems in health consultations.
