Table of Contents
Fetching ...

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

Nazar Ponochevnyi, Anastasia Kuzminykh

TL;DR

The paper investigates cross-modality prompt alignment in AI-assisted chart authoring by comparing spoken versus typed instructions. It collects a voice dataset (n=25, 100 prompts) and two text datasets (NLV Corpus and nvBench), applying qualitative coding to identify 6 input strategies and 22 chart-element types across 5 categories. It finds that voice prompts are longer and more diverse, with greater linguistic complexity, while text prompts concentrate on basic elements, signaling the need for modality-specific processing and design guidelines. The authors contribute a design framework for voice-enabled authoring, actionable guidelines for text-based systems to support speech modality, and a publicly available voice-instruction dataset to support development and evaluation of multimodal visualization tools.

Abstract

Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.

Chart What I Say: Exploring Cross-Modality Prompt Alignment in AI-Assisted Chart Authoring

TL;DR

The paper investigates cross-modality prompt alignment in AI-assisted chart authoring by comparing spoken versus typed instructions. It collects a voice dataset (n=25, 100 prompts) and two text datasets (NLV Corpus and nvBench), applying qualitative coding to identify 6 input strategies and 22 chart-element types across 5 categories. It finds that voice prompts are longer and more diverse, with greater linguistic complexity, while text prompts concentrate on basic elements, signaling the need for modality-specific processing and design guidelines. The authors contribute a design framework for voice-enabled authoring, actionable guidelines for text-based systems to support speech modality, and a publicly available voice-instruction dataset to support development and evaluation of multimodal visualization tools.

Abstract

Recent chart-authoring systems, such as Amazon Q in QuickSight and Copilot for Power BI, demonstrate an emergent focus on supporting natural language input to share meaningful insights from data through chart creation. Currently, chart-authoring systems tend to integrate voice input capabilities by relying on speech-to-text transcription, processing spoken and typed input similarly. However, cross-modality input comparisons in other interaction domains suggest that the structure of spoken and typed-in interactions could notably differ, reflecting variations in user expectations based on interface affordances. Thus, in this work, we compare spoken and typed instructions for chart creation. Findings suggest that while both text and voice instructions cover chart elements and element organization, voice descriptions have a variety of command formats, element characteristics, and complex linguistic features. Based on these findings, we developed guidelines for designing voice-based authoring-oriented systems and additional features that can be incorporated into existing text-based systems to support speech modality.
Paper Structure (10 sections, 3 figures, 1 table)

This paper contains 10 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Examples of the text stimuli provided to participants of the user study. From statista.
  • Figure 2: Summary of the word count of the 76 voice, 200 text, and 200 synthetic text instructions.
  • Figure 3: The number of times each element of 5 major types was applied to each input strategy in voice chart-authoring instructions (A), text instructions (B), and synthetic text instructions (C). Color intensity corresponds to data magnitude.