Table of Contents
Fetching ...

Confides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration

Sunwoo Ha, Chaehun Lim, R. Jordan Crouser, Alvitta Ottley

TL;DR

Confides addresses the problem of opaque ASR confidence by providing a visual analytics platform that communicates word- and segment-level uncertainties to analysts. It integrates AWS Transcribe outputs and offers three coordinated views—Confidence Overview, Transcription Editor, and Context Word Tree—to support exploration, editing, and pattern discovery with uncertainty awareness. The paper provides a design-grounded methodology, iterative collaboration with intelligence analysts, and a Nixon Tapes case study to demonstrate practical use and decision-making around when to rely on AI outputs. The work highlights implications for textual data cleaning, model transparency, and future work on improving human-machine collaboration in high-stakes analytic workflows.

Abstract

Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the confidence associated with the transcription. We demonstrate how our tool can assist intelligence analysts who use ASR outputs in their analytical and exploratory tasks and how it can help mitigate misinterpretation of crucial information. We also discuss opportunities for improving textual data cleaning and model transparency for human-machine collaboration.

Confides: A Visual Analytics Solution for Automated Speech Recognition Analysis and Exploration

TL;DR

Confides addresses the problem of opaque ASR confidence by providing a visual analytics platform that communicates word- and segment-level uncertainties to analysts. It integrates AWS Transcribe outputs and offers three coordinated views—Confidence Overview, Transcription Editor, and Context Word Tree—to support exploration, editing, and pattern discovery with uncertainty awareness. The paper provides a design-grounded methodology, iterative collaboration with intelligence analysts, and a Nixon Tapes case study to demonstrate practical use and decision-making around when to rely on AI outputs. The work highlights implications for textual data cleaning, model transparency, and future work on improving human-machine collaboration in high-stakes analytic workflows.

Abstract

Confidence scores of automatic speech recognition (ASR) outputs are often inadequately communicated, preventing its seamless integration into analytical workflows. In this paper, we introduce ConFides, a visual analytic system developed in collaboration with intelligence analysts to address this issue. ConFides aims to aid exploration and post-AI-transcription editing by visually representing the confidence associated with the transcription. We demonstrate how our tool can assist intelligence analysts who use ASR outputs in their analytical and exploratory tasks and how it can help mitigate misinterpretation of crucial information. We also discuss opportunities for improving textual data cleaning and model transparency for human-machine collaboration.
Paper Structure (20 sections, 3 figures)

This paper contains 20 sections, 3 figures.

Figures (3)

  • Figure 1: The framework of Confides. The audio files are uploaded and sent to AWS for automatic transcription. Users can select which transcriptions to explore and analyze.
  • Figure 2: Searching for "pandas" in the current transcription revealed two instances. We can observe that the first instance of this search term has a confidence score of 52%.
  • Figure 3: After searching for "pan," we observe "zoo" in the word tree. This indicates that "panda" was misclassified as "pan" and hints that a zoo may be where the pandas will be kept.