Table of Contents
Fetching ...

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi

TL;DR

This work introduces WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis, and implements optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds.

Abstract

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis' utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

TL;DR

This work introduces WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis, and implements optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds.

Abstract

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis' utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.
Paper Structure (35 sections, 12 figures)

This paper contains 35 sections, 12 figures.

Figures (12)

  • Figure 1: Illustration of an exact, compositional filter-based search in WildVis. This example demonstrates the application of multiple criteria, including the keyword "Election," conversations with more than two turns, and chats from users in Florida.
  • Figure 2: WildVis Filter-Based Search Page. This screenshot shows the application of multiple filters, including conversation content ("homework"), non-toxicity, and language (English), to narrow down the search results. The interface displays relevant conversations that match the specified criteria. Users can click on each conversation ID to navigate to the conversation details page. Additionally, metadata in the displayed results, such as the hashed IP address, is clickable, allowing users to filter based on that specific metadata.
  • Figure 3: WildVis Embedding Visualization page. Each dot represents a conversation, with green dots from WildChat, blue dots from LMSYS-Chat-1M, and red dots highlighting conversations that match the applied filters (containing "python" in this example). Users can interact with the visualization by hovering over dots to preview a conversation and clicking on a dot to navigate to the full conversation. This figure has been enhanced to show a representative example from each category: "WildChat," "LMSYS-Chat-1M," and "Filter Match."
  • Figure 4: System Architecture: Overview of the data flow from user query submission to result rendering in the browser. The software tools used in the frontend, backend, and search engine are italicized.
  • Figure 5: Major topic clusters. (a) Coding (identified by searching for "python"). (b) Writing assistance (identified by searching for "email"). (c) Story generation (identified by searching for "story"). (d) Math question answering (identified by searching for "how many").
  • ...and 7 more figures