Table of Contents
Fetching ...

Evaluating Embeddings for One-Shot Classification of Doctor-AI Consultations

Olumide Ebenezer Ojo, Olaronke Oluwayemisi Adebanji, Alexander Gelbukh, Hiram Calvo, Anna Feldman

TL;DR

The paper addresses the challenge of distinguishing Doctor-written versus AI-generated medical consultations in low-data settings by evaluating multiple embeddings (BoW/TF-IDF, character n-grams, Word2Vec, GloVe, fastText, GPT2) within one-shot classifiers across the MEDIC dataset variants. It finds that Word2Vec, GloVe, and character-n-grams consistently offer strong semantic representations, while GPT2 embeddings show more variable performance depending on the dataset and model. The results underscore substantial model-embedding interactions and dataset-specific effects, with no single embedding universally superior. This work informs practical deployment of AI in healthcare by guiding embedding and model choices for reliable, low-data doctor-patient text classification and quality control of online medical advice.

Abstract

Effective communication between healthcare providers and patients is crucial to providing high-quality patient care. In this work, we investigate how Doctor-written and AI-generated texts in healthcare consultations can be classified using state-of-the-art embeddings and one-shot classification systems. By analyzing embeddings such as bag-of-words, character n-grams, Word2Vec, GloVe, fastText, and GPT2 embeddings, we examine how well our one-shot classification systems capture semantic information within medical consultations. Results show that the embeddings are capable of capturing semantic features from text in a reliable and adaptable manner. Overall, Word2Vec, GloVe and Character n-grams embeddings performed well, indicating their suitability for modeling targeted to this task. GPT2 embedding also shows notable performance, indicating its suitability for models tailored to this task as well. Our machine learning architectures significantly improved the quality of health conversations when training data are scarce, improving communication between patients and healthcare providers.

Evaluating Embeddings for One-Shot Classification of Doctor-AI Consultations

TL;DR

The paper addresses the challenge of distinguishing Doctor-written versus AI-generated medical consultations in low-data settings by evaluating multiple embeddings (BoW/TF-IDF, character n-grams, Word2Vec, GloVe, fastText, GPT2) within one-shot classifiers across the MEDIC dataset variants. It finds that Word2Vec, GloVe, and character-n-grams consistently offer strong semantic representations, while GPT2 embeddings show more variable performance depending on the dataset and model. The results underscore substantial model-embedding interactions and dataset-specific effects, with no single embedding universally superior. This work informs practical deployment of AI in healthcare by guiding embedding and model choices for reliable, low-data doctor-patient text classification and quality control of online medical advice.

Abstract

Effective communication between healthcare providers and patients is crucial to providing high-quality patient care. In this work, we investigate how Doctor-written and AI-generated texts in healthcare consultations can be classified using state-of-the-art embeddings and one-shot classification systems. By analyzing embeddings such as bag-of-words, character n-grams, Word2Vec, GloVe, fastText, and GPT2 embeddings, we examine how well our one-shot classification systems capture semantic information within medical consultations. Results show that the embeddings are capable of capturing semantic features from text in a reliable and adaptable manner. Overall, Word2Vec, GloVe and Character n-grams embeddings performed well, indicating their suitability for modeling targeted to this task. GPT2 embedding also shows notable performance, indicating its suitability for models tailored to this task as well. Our machine learning architectures significantly improved the quality of health conversations when training data are scarce, improving communication between patients and healthcare providers.
Paper Structure (15 sections, 6 figures, 3 tables)

This paper contains 15 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Accuracy for Different Models and Feature Combinations in the DC Dataset
  • Figure 2: Accuracy for Different Models and Feature Combinations in the DR Dataset
  • Figure 3: Accuracy for Different Models and Feature Combinations in the DCR Dataset
  • Figure 4: Accuracy heatmap for DC Dataset
  • Figure 5: Accuracy heatmap for DR Dataset
  • ...and 1 more figures