VisTR: Visualizations as Representations for Time-series Table Reasoning
Jianing Hao, Zhuowen Liang, Chunting Li, Yuyu Luo, Jie Li, Wei Zeng
TL;DR
VisTR presents a novel framework that treats visualizations as representational proxies to enhance time-series table reasoning. By transforming tables into fixed-size visualization references and aligning charts, text, and hand-drawn sketches within a joint embedding space, VisTR improves pattern recognition across short and long time scales while mitigating context drift. The system combines a data-facet visualization referencing pipeline, a multimodal alignment model with two-level cross-entropy and bidirectional triplet losses, and an efficient pruning/indexing backend (Chroma) to support scalable, multimodal querying via an interactive interface. Quantitative results and case studies show strong multimodal alignment and practical usefulness for exploring financial and environmental datasets. This approach opens new avenues for integrating visual reasoning into table-based analytics and multimodal data exploration, with potential extensions to broader table types and queries.
Abstract
Time-series table reasoning interprets temporal patterns and relationships in data to answer user queries. Despite recent advancements leveraging large language models (LLMs), existing methods often struggle with pattern recognition, context drift in long time-series data, and the lack of visual-based reasoning capabilities. To address these challenges, we propose VisTR, a framework that places visualizations at the core of the reasoning process. Specifically, VisTR leverages visualizations as representations to bridge raw time-series data and human cognitive processes. By transforming tables into fixed-size visualization references, it captures key trends, anomalies, and temporal relationships, facilitating intuitive and interpretable reasoning. These visualizations are aligned with user input, i.e., charts, text, and sketches, through a fine-tuned multimodal LLM, ensuring robust cross-modal alignment. To handle large-scale data, VisTR integrates pruning and indexing mechanisms for scalable and efficient retrieval. Finally, an interactive visualization interface supports seamless multimodal exploration, enabling users to interact with data through both textual and visual modalities. Quantitative evaluations demonstrate the effectiveness of VisTR in aligning multimodal inputs and improving reasoning accuracy. Case studies further illustrate its applicability to various time-series reasoning and exploration tasks.
