Table of Contents
Fetching ...

"Dialogue" vs "Dialog" in NLP and AI research: Statistics from a Confused Discourse

David Gros

TL;DR

This study investigates the spelling variation between 'dialogue' and 'dialog' in NLP/AI research using a large, data-driven corpus. It defines Dialog(ue) Papers and High Impact Dialog(ue) Venues, assembles data from the Semantic Scholar corpus up to March 2024, and analyzes distribution, time trends, author-level patterns, nationality effects, and contextual influences with a multi-method approach (noun-phrase analysis, RoBERTa embeddings, morphology, and source-code usage). Key findings include a dominant use of 'dialogue' (72%) with substantial use of 'dialog' (24%) and some mixed usage (5%), no clear long-term shift, and only weak evidence that context or nationality strongly predict spelling; code and compound usage hint at economy driving some choices. The results offer a descriptive framework for orthography in scientific discourse and highlight the need for body-text analyses and cross-field comparisons to understand spelling variance more deeply. The work provides practical insights for researchers, editors, and tooling to better navigate spelling conventions in computing literature.

Abstract

Within computing research, there are two spellings for an increasingly important term - dialogue and dialog. We analyze thousands of research papers to understand this "dialog(ue) debacle". Among publications in top venues that use "dialog(ue)" in the title or abstract, 72% use "dialogue", 24% use "dialog", and 5% use both in the same title and abstract. This split distribution is more common in Computing than any other academic discipline. We investigate trends over ~20 years of NLP/AI research, not finding clear evidence of a shift over time. Author nationality is weakly correlated with spelling choice, but far from explains the mixed use. Many prolific authors publish papers with both spellings. We use several methods (such as syntactic parses and LM embeddings) to study how dialog(ue) context influences spelling, finding limited influence. Combining these results together, we discuss different theories that might explain the dialog(ue) divergence.

"Dialogue" vs "Dialog" in NLP and AI research: Statistics from a Confused Discourse

TL;DR

This study investigates the spelling variation between 'dialogue' and 'dialog' in NLP/AI research using a large, data-driven corpus. It defines Dialog(ue) Papers and High Impact Dialog(ue) Venues, assembles data from the Semantic Scholar corpus up to March 2024, and analyzes distribution, time trends, author-level patterns, nationality effects, and contextual influences with a multi-method approach (noun-phrase analysis, RoBERTa embeddings, morphology, and source-code usage). Key findings include a dominant use of 'dialogue' (72%) with substantial use of 'dialog' (24%) and some mixed usage (5%), no clear long-term shift, and only weak evidence that context or nationality strongly predict spelling; code and compound usage hint at economy driving some choices. The results offer a descriptive framework for orthography in scientific discourse and highlight the need for body-text analyses and cross-field comparisons to understand spelling variance more deeply. The work provides practical insights for researchers, editors, and tooling to better navigate spelling conventions in computing literature.

Abstract

Within computing research, there are two spellings for an increasingly important term - dialogue and dialog. We analyze thousands of research papers to understand this "dialog(ue) debacle". Among publications in top venues that use "dialog(ue)" in the title or abstract, 72% use "dialogue", 24% use "dialog", and 5% use both in the same title and abstract. This split distribution is more common in Computing than any other academic discipline. We investigate trends over ~20 years of NLP/AI research, not finding clear evidence of a shift over time. Author nationality is weakly correlated with spelling choice, but far from explains the mixed use. Many prolific authors publish papers with both spellings. We use several methods (such as syntactic parses and LM embeddings) to study how dialog(ue) context influences spelling, finding limited influence. Combining these results together, we discuss different theories that might explain the dialog(ue) divergence.
Paper Structure (32 sections, 10 figures, 2 tables)

This paper contains 32 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Examples of varying uses of dialog(ue) in prominent NLP/AI research.
  • Figure 2: Distribution for CS Dialogu(ue) Publications.
  • Figure 3: Dialog(ue) Papers across disciplines.
  • Figure 4: CS Dialog(ue) Publications by Year. To reduce noise, we group into 2-year intervals. Shaded line area is a 95% 2-year bootstraped CI.
  • Figure 5: Distribution for 100 authors with most CS Dialog(ue) Publications.
  • ...and 5 more figures