Confides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration
Sunwoo Ha, Chaehun Lim, R. Jordan Crouser, Alvitta Ottley
TL;DR
Confides addresses the problem of opaque ASR confidence by providing a visual analytics platform that communicates word- and segment-level uncertainties to analysts. It integrates AWS Transcribe outputs and offers three coordinated views—Confidence Overview, Transcription Editor, and Context Word Tree—to support exploration, editing, and pattern discovery with uncertainty awareness. The paper provides a design-grounded methodology, iterative collaboration with intelligence analysts, and a Nixon Tapes case study to demonstrate practical use and decision-making around when to rely on AI outputs. The work highlights implications for textual data cleaning, model transparency, and future work on improving human-machine collaboration in high-stakes analytic workflows.
Abstract
Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the confidence associated with the transcription. We demonstrate how our tool can assist intelligence analysts who use ASR outputs in their analytical and exploratory tasks and how it can help mitigate misinterpretation of crucial information. We also discuss opportunities for improving textual data cleaning and model transparency for human-machine collaboration.
