Table of Contents
Fetching ...

MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model

Chengze Zhang, Changshan Li, Shiyang Gao

TL;DR

The paper addresses the challenge of automated, context-aware storytelling from multi-dimensional data by introducing MDSF, a framework based on large language models. It combines automated data parsing, augmented analysis, structured insight scoring (including Importance, Significance, Surprise, Fatigue, and Interpretability), and an agent-driven context storytelling component to produce coherent narratives with visuals. Empirical results across private and public datasets show MDSF achieves strong insight ranking (SFD), high description accuracy, and competitive data story generation, with user studies highlighting superior structure and richness. This framework advances practical data storytelling by reducing manual intervention, mitigating interpretive bias, and enabling real-time, context-sensitive narratives suitable for diverse domains.

Abstract

The exponential growth of data and advancements in big data technologies have created a demand for more efficient and automated approaches to data analysis and storytelling. However, automated data analysis systems still face challenges in leveraging large language models (LLMs) for data insight discovery, augmented analysis, and data storytelling. This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling. The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights. The use of fine-tuned LLMs enhances contextual understanding and generates narratives with minimal manual intervention. The architecture also includes an agent-based mechanism for real-time storytelling continuation control. Key findings reveal that MDSF outperforms existing methods across various datasets in terms of insight ranking accuracy, descriptive quality, and narrative coherence. The experimental evaluation demonstrates MDSF's ability to automate complex analytical tasks, reduce interpretive biases, and improve user satisfaction. User studies further underscore its practical utility in enhancing content structure, conclusion extraction, and richness of detail.

MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model

TL;DR

The paper addresses the challenge of automated, context-aware storytelling from multi-dimensional data by introducing MDSF, a framework based on large language models. It combines automated data parsing, augmented analysis, structured insight scoring (including Importance, Significance, Surprise, Fatigue, and Interpretability), and an agent-driven context storytelling component to produce coherent narratives with visuals. Empirical results across private and public datasets show MDSF achieves strong insight ranking (SFD), high description accuracy, and competitive data story generation, with user studies highlighting superior structure and richness. This framework advances practical data storytelling by reducing manual intervention, mitigating interpretive bias, and enabling real-time, context-sensitive narratives suitable for diverse domains.

Abstract

The exponential growth of data and advancements in big data technologies have created a demand for more efficient and automated approaches to data analysis and storytelling. However, automated data analysis systems still face challenges in leveraging large language models (LLMs) for data insight discovery, augmented analysis, and data storytelling. This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling. The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights. The use of fine-tuned LLMs enhances contextual understanding and generates narratives with minimal manual intervention. The architecture also includes an agent-based mechanism for real-time storytelling continuation control. Key findings reveal that MDSF outperforms existing methods across various datasets in terms of insight ranking accuracy, descriptive quality, and narrative coherence. The experimental evaluation demonstrates MDSF's ability to automate complex analytical tasks, reduce interpretive biases, and improve user satisfaction. User studies further underscore its practical utility in enhancing content structure, conclusion extraction, and richness of detail.
Paper Structure (34 sections, 5 equations, 4 figures, 4 tables)

This paper contains 34 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison of traditional and intelligent data analysis pipelines.
  • Figure 2: An Overview of MDSF
  • Figure 3: Performance comparison of different models based on ACC metric
  • Figure 4: User study results