Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

Maria Leonor Pacheco; Tunazzina Islam; Lyle Ungar; Ming Yin; Dan Goldwasser

Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

Maria Leonor Pacheco, Tunazzina Islam, Lyle Ungar, Ming Yin, Dan Goldwasser

TL;DR

This paper expands the definition of a theme to account for more than just a word distribution, and includes generalized concepts deemed relevant by domain experts, and proposes an interactive framework that receives and encodes expert feedback at different levels of abstraction.

Abstract

Experts across diverse disciplines are often interested in making sense of large text collections. Traditionally, this challenge is approached either by noisy unsupervised techniques such as topic models, or by following a manual theme discovery process. In this paper, we expand the definition of a theme to account for more than just a word distribution, and include generalized concepts deemed relevant by domain experts. Then, we propose an interactive framework that receives and encodes expert feedback at different levels of abstraction. Our framework strikes a balance between automation and manual coding, allowing experts to maintain control of their study while reducing the manual effort required.

Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 29 figures, 11 tables)

This paper contains 24 sections, 1 equation, 29 figures, 11 tables.

Introduction
Related Work
The Framework
Representing Themes and Instances
Interaction Protocol
Mapping and Re-partitioning
Case Studies
Coverage vs. Mapping Quality
Effects of Consecutive Iterations
Consistency between Different Expert Groups
Abstract Themes vs. Word-level Topics
Limitations
Summary
Appendix
Tool Screenshots
...and 9 more sections

Figures (29)

Figure 1: Framework Overview
Figure 2: Theme Assignments Where Distance to Theme Centroid $\leq$ Quartile
Figure 3: Confusion matrix for Covid after second iteration. Values are normalized over the predicted themes (cols), and sorted from best to worst.
Figure 4: Shifting predictions for Immigration. Themes added during second iteration are shown in red, and values are normalized over the full population.
Figure 5: Theme Overlap Coefficient Heatmap between Different Groups of Experts
...and 24 more figures

Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

TL;DR

Abstract

Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections

Authors

TL;DR

Abstract

Table of Contents

Figures (29)