Table of Contents
Fetching ...

Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic

Abhay Bhandarkar, Gaurav Mishra, Khushi Juchani, Harsh Singhal

TL;DR

This paper demonstrates how BERTopic can uncover thematically coherent patterns in a large, multilingual LLM conversation corpus (LMSYS-Chat-1M) and relate these topics to human preferences collected via the Chatbot Arena framework. The authors design a rigorous preprocessing and modeling pipeline (embedding with all-MiniLM-L6-v2, UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF topic labeling) to extract 29 distinct topics from over a million conversational records. By leveraging human preference data, they quantify topic-specific model performance and reveal strong domain-specific strengths among top LLMs, while also showing no model achieves uniform excellence across all topics. The findings offer practical guidance for domain-focused fine-tuning and model selection, and point to future work extending topic-centric evaluation to multimodal settings for broader real-world applicability.

Abstract

This study applies BERTopic, a transformer-based topic modeling technique, to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs). Each user prompt is paired with two anonymized LLM responses and a human preference label, used to assess user evaluation of competing model outputs. The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences, particularly if certain LLMs are consistently preferred within specific topics. A robust preprocessing pipeline was designed for multilingual variation, balancing dialogue turns, and cleaning noisy or redacted data. BERTopic extracted over 29 coherent topics including artificial intelligence, programming, ethics, and cloud infrastructure. We analysed relationships between topics and model preferences to identify trends in model-topic alignment. Visualization techniques included inter-topic distance maps, topic probability distributions, and model-versus-topic matrices. Our findings inform domain-specific fine-tuning and optimization strategies for improving real-world LLM performance and user satisfaction.

Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic

TL;DR

This paper demonstrates how BERTopic can uncover thematically coherent patterns in a large, multilingual LLM conversation corpus (LMSYS-Chat-1M) and relate these topics to human preferences collected via the Chatbot Arena framework. The authors design a rigorous preprocessing and modeling pipeline (embedding with all-MiniLM-L6-v2, UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF topic labeling) to extract 29 distinct topics from over a million conversational records. By leveraging human preference data, they quantify topic-specific model performance and reveal strong domain-specific strengths among top LLMs, while also showing no model achieves uniform excellence across all topics. The findings offer practical guidance for domain-focused fine-tuning and model selection, and point to future work extending topic-centric evaluation to multimodal settings for broader real-world applicability.

Abstract

This study applies BERTopic, a transformer-based topic modeling technique, to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs). Each user prompt is paired with two anonymized LLM responses and a human preference label, used to assess user evaluation of competing model outputs. The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences, particularly if certain LLMs are consistently preferred within specific topics. A robust preprocessing pipeline was designed for multilingual variation, balancing dialogue turns, and cleaning noisy or redacted data. BERTopic extracted over 29 coherent topics including artificial intelligence, programming, ethics, and cloud infrastructure. We analysed relationships between topics and model preferences to identify trends in model-topic alignment. Visualization techniques included inter-topic distance maps, topic probability distributions, and model-versus-topic matrices. Our findings inform domain-specific fine-tuning and optimization strategies for improving real-world LLM performance and user satisfaction.

Paper Structure

This paper contains 26 sections, 6 equations, 11 figures.

Figures (11)

  • Figure 1: Architectural diagram of the topic modeling pipeline.
  • Figure 2: Distribution of LLM appearances in the LMSYS-Chat-1M dataset
  • Figure 3: Overall win/loss/tie distribution in pairwise evaluations (34.9%/34.2%/30.9% split).
  • Figure 4: User preference for shorter vs. longer responses (57.9%/42.1% split).
  • Figure 5: Clustering through HDBSCAN
  • ...and 6 more figures