Table of Contents
Fetching ...

Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data

Zijian Li, Jingjing Fu, Lei Song, Jiang Bian, Jun Zhang, Rui Wang

TL;DR

This work tackles the scarcity of high-quality, fine-grained chart reasoning data for multimodal LLMs by introducing Chain of Functions (CoF), a two-stage data synthesis pipeline that first explores atomic chart functions to form diverse, explainable reasoning chains and then translates them into natural-language rationales and questions via reverse linguistic CoT. The ChartCoF dataset, covering 19 chart types and tens of thousands of Q&As, enables fine-grained evaluation and reasoning enhancement, demonstrated by state-of-the-art performance among same-scale models after finetuning and by improved generalization on out-of-distribution chart types. The approach reduces reliance on extremely large models, improves data precision and diversity, and suggests broader applicability to other step-wise reasoning tasks beyond charts. Overall, CoF provides a scalable, explainable path to stronger chart reasoning in MLLMs and offers a framework that could extend to math Q&A and GUI-oriented reasoning.

Abstract

Visual reasoning is crucial for multimodal large language models (MLLMs) to address complex chart queries, yet high-quality rationale data remains scarce. Existing methods leveraged (M)LLMs for data generation, but direct prompting often yields limited precision and diversity. In this paper, we propose \textit{Chain of Functions (CoF)}, a novel programmatic reasoning data generation pipeline that utilizes freely-explored reasoning paths as supervision to ensure data precision and diversity. Specifically, it starts with human-free exploration among the atomic functions (e.g., maximum data and arithmetic operations) to generate diverse function chains, which are then translated into linguistic rationales and questions with only a moderate open-sourced LLM. \textit{CoF} provides multiple benefits: 1) Precision: function-governed generation reduces hallucinations compared to freeform generation; 2) Diversity: enumerating function chains enables varied question taxonomies; 3) Explainability: function chains serve as built-in rationales, allowing fine-grained evaluation beyond overall accuracy; 4) Practicality: eliminating reliance on extremely large models. Employing \textit{CoF}, we construct the \textit{ChartCoF} dataset, with 1.4k complex reasoning Q\&A for fine-grained analysis and 50k Q\&A for reasoning enhancement. The fine-grained evaluation on \textit{ChartCoF} reveals varying performance across question taxonomies for each MLLM, and the experiments also show that finetuning with \textit{ChartCoF} achieves state-of-the-art performance among same-scale MLLMs on widely used benchmarks. Furthermore, the novel paradigm of function-governed rationale generation in \textit{CoF} could inspire broader applications beyond charts.

Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data

TL;DR

This work tackles the scarcity of high-quality, fine-grained chart reasoning data for multimodal LLMs by introducing Chain of Functions (CoF), a two-stage data synthesis pipeline that first explores atomic chart functions to form diverse, explainable reasoning chains and then translates them into natural-language rationales and questions via reverse linguistic CoT. The ChartCoF dataset, covering 19 chart types and tens of thousands of Q&As, enables fine-grained evaluation and reasoning enhancement, demonstrated by state-of-the-art performance among same-scale models after finetuning and by improved generalization on out-of-distribution chart types. The approach reduces reliance on extremely large models, improves data precision and diversity, and suggests broader applicability to other step-wise reasoning tasks beyond charts. Overall, CoF provides a scalable, explainable path to stronger chart reasoning in MLLMs and offers a framework that could extend to math Q&A and GUI-oriented reasoning.

Abstract

Visual reasoning is crucial for multimodal large language models (MLLMs) to address complex chart queries, yet high-quality rationale data remains scarce. Existing methods leveraged (M)LLMs for data generation, but direct prompting often yields limited precision and diversity. In this paper, we propose \textit{Chain of Functions (CoF)}, a novel programmatic reasoning data generation pipeline that utilizes freely-explored reasoning paths as supervision to ensure data precision and diversity. Specifically, it starts with human-free exploration among the atomic functions (e.g., maximum data and arithmetic operations) to generate diverse function chains, which are then translated into linguistic rationales and questions with only a moderate open-sourced LLM. \textit{CoF} provides multiple benefits: 1) Precision: function-governed generation reduces hallucinations compared to freeform generation; 2) Diversity: enumerating function chains enables varied question taxonomies; 3) Explainability: function chains serve as built-in rationales, allowing fine-grained evaluation beyond overall accuracy; 4) Practicality: eliminating reliance on extremely large models. Employing \textit{CoF}, we construct the \textit{ChartCoF} dataset, with 1.4k complex reasoning Q\&A for fine-grained analysis and 50k Q\&A for reasoning enhancement. The fine-grained evaluation on \textit{ChartCoF} reveals varying performance across question taxonomies for each MLLM, and the experiments also show that finetuning with \textit{ChartCoF} achieves state-of-the-art performance among same-scale MLLMs on widely used benchmarks. Furthermore, the novel paradigm of function-governed rationale generation in \textit{CoF} could inspire broader applications beyond charts.

Paper Structure

This paper contains 42 sections, 5 figures, 16 tables.

Figures (5)

  • Figure 1: Our proposed CoF constructs a high-quality reasoning dataset ChartCoF for the fine-grained evaluation and reasoning enhancement of MLLMs.
  • Figure 2: Overview of chain of functions. We prompt LLMs to fill in the JSON template to construct JSON seed and evolve (modify) it to more accurate and diverse JSON data. The JSON data are then used to generate function chains through functional discovery. The function chains are then transferred to CoT data by prompting LLMs.
  • Figure 3: (a) Accuracy of MLLMs with and without CoT strategies on ChartCoF. (b) Accuracy of MLLMs across questions with different lengths of function chains. (c) Accuracy of MLLMs across questions with different function chains. Some corresponding examples are presented in Table \ref{['example_for_function_chain']}. (d) Accuracy of MLLMs across questions with different short function chains.
  • Figure 4: Accuracy of InternVL2.5 series (2B, 8B, and 26B) on ChartBench, ChartX and ChartCoF.
  • Figure 5: Accuracy of InternVL2.5-8B on ChartBench, ChartX and ChartCoF.