Table of Contents
Fetching ...

VisTR: Visualizations as Representations for Time-series Table Reasoning

Jianing Hao, Zhuowen Liang, Chunting Li, Yuyu Luo, Jie Li, Wei Zeng

TL;DR

VisTR presents a novel framework that treats visualizations as representational proxies to enhance time-series table reasoning. By transforming tables into fixed-size visualization references and aligning charts, text, and hand-drawn sketches within a joint embedding space, VisTR improves pattern recognition across short and long time scales while mitigating context drift. The system combines a data-facet visualization referencing pipeline, a multimodal alignment model with two-level cross-entropy and bidirectional triplet losses, and an efficient pruning/indexing backend (Chroma) to support scalable, multimodal querying via an interactive interface. Quantitative results and case studies show strong multimodal alignment and practical usefulness for exploring financial and environmental datasets. This approach opens new avenues for integrating visual reasoning into table-based analytics and multimodal data exploration, with potential extensions to broader table types and queries.

Abstract

Time-series table reasoning interprets temporal patterns and relationships in data to answer user queries. Despite recent advancements leveraging large language models (LLMs), existing methods often struggle with pattern recognition, context drift in long time-series data, and the lack of visual-based reasoning capabilities. To address these challenges, we propose VisTR, a framework that places visualizations at the core of the reasoning process. Specifically, VisTR leverages visualizations as representations to bridge raw time-series data and human cognitive processes. By transforming tables into fixed-size visualization references, it captures key trends, anomalies, and temporal relationships, facilitating intuitive and interpretable reasoning. These visualizations are aligned with user input, i.e., charts, text, and sketches, through a fine-tuned multimodal LLM, ensuring robust cross-modal alignment. To handle large-scale data, VisTR integrates pruning and indexing mechanisms for scalable and efficient retrieval. Finally, an interactive visualization interface supports seamless multimodal exploration, enabling users to interact with data through both textual and visual modalities. Quantitative evaluations demonstrate the effectiveness of VisTR in aligning multimodal inputs and improving reasoning accuracy. Case studies further illustrate its applicability to various time-series reasoning and exploration tasks.

VisTR: Visualizations as Representations for Time-series Table Reasoning

TL;DR

VisTR presents a novel framework that treats visualizations as representational proxies to enhance time-series table reasoning. By transforming tables into fixed-size visualization references and aligning charts, text, and hand-drawn sketches within a joint embedding space, VisTR improves pattern recognition across short and long time scales while mitigating context drift. The system combines a data-facet visualization referencing pipeline, a multimodal alignment model with two-level cross-entropy and bidirectional triplet losses, and an efficient pruning/indexing backend (Chroma) to support scalable, multimodal querying via an interactive interface. Quantitative results and case studies show strong multimodal alignment and practical usefulness for exploring financial and environmental datasets. This approach opens new avenues for integrating visual reasoning into table-based analytics and multimodal data exploration, with potential extensions to broader table types and queries.

Abstract

Time-series table reasoning interprets temporal patterns and relationships in data to answer user queries. Despite recent advancements leveraging large language models (LLMs), existing methods often struggle with pattern recognition, context drift in long time-series data, and the lack of visual-based reasoning capabilities. To address these challenges, we propose VisTR, a framework that places visualizations at the core of the reasoning process. Specifically, VisTR leverages visualizations as representations to bridge raw time-series data and human cognitive processes. By transforming tables into fixed-size visualization references, it captures key trends, anomalies, and temporal relationships, facilitating intuitive and interpretable reasoning. These visualizations are aligned with user input, i.e., charts, text, and sketches, through a fine-tuned multimodal LLM, ensuring robust cross-modal alignment. To handle large-scale data, VisTR integrates pruning and indexing mechanisms for scalable and efficient retrieval. Finally, an interactive visualization interface supports seamless multimodal exploration, enabling users to interact with data through both textual and visual modalities. Quantitative evaluations demonstrate the effectiveness of VisTR in aligning multimodal inputs and improving reasoning accuracy. Case studies further illustrate its applicability to various time-series reasoning and exploration tasks.
Paper Structure (23 sections, 3 equations, 10 figures, 1 table)

This paper contains 23 sections, 3 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: VisTR leverages visualizations as the representations to bridge the gap between time-series data and user queries. VisTR enhances existing LLM-based table reasoning methods by enabling robust data change pattern recognition (Q1), improving pattern recognition for long-term series (Q2), and facilitating visual-based pattern exploration (Q3).
  • Figure 2: An overview of the proposed framework, which mainly contains four main modules of visualization referencing, visualization pruning, visualization alignment, and visualization interaction.
  • Figure 3: An overview of visualization alignment module, which contains data preparation for chart-text and chart-sketch pairings, and achieves the alignment for chart, sketch, and text modalities.
  • Figure 4: The alignment pipeline for three modalities. Each modality is processed by a separate encoder, and the chart modality is set as the supervisor, guiding the alignment process.
  • Figure 5: The pipeline of visualization referencing and visualization pruning modules. Visualization pruning and indexing are used to accelerate the storage and retrieval process.
  • ...and 5 more figures