Table of Contents
Fetching ...

Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering

Yanjie Zhang, Yafei Li, Rui Sheng, Zixin Chen, Yanna Lin, Huamin Qu, Lei Chen, Yushi Sun

Abstract

Despite the success of Vision-Language Models (VLMs), misleading charts remain a significant challenge due to their deceptive visual structures and distorted data representations. We present ChartCynics, an agentic dual-path framework designed to unmask visual deception via a "skeptical" reasoning paradigm. Unlike holistic models, ChartCynics decouples perception from verification: a Diagnostic Vision Path captures structural anomalies (e.g., inverted axes) through strategic ROI cropping, while an OCR-Driven Data Path ensures numerical grounding. To resolve cross-modal conflicts, we introduce an Agentic Summarizer optimized via a two-stage protocol: Oracle-Informed SFT for reasoning distillation and Deception-Aware GRPO for adversarial alignment. This pipeline effectively penalizes visual traps and enforces logical consistency. Evaluations on two benchmarks show that ChartCynics achieves 74.43% and 64.55% accuracy, providing an absolute performance boost of ~29% over the Qwen3-VL-8B backbone, outperforming state-of-the-art proprietary models. Our results demonstrate that specialized agentic workflows can grant smaller open-source models superior robustness, establishing a new foundation for trustworthy chart interpretation.

Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering

Abstract

Despite the success of Vision-Language Models (VLMs), misleading charts remain a significant challenge due to their deceptive visual structures and distorted data representations. We present ChartCynics, an agentic dual-path framework designed to unmask visual deception via a "skeptical" reasoning paradigm. Unlike holistic models, ChartCynics decouples perception from verification: a Diagnostic Vision Path captures structural anomalies (e.g., inverted axes) through strategic ROI cropping, while an OCR-Driven Data Path ensures numerical grounding. To resolve cross-modal conflicts, we introduce an Agentic Summarizer optimized via a two-stage protocol: Oracle-Informed SFT for reasoning distillation and Deception-Aware GRPO for adversarial alignment. This pipeline effectively penalizes visual traps and enforces logical consistency. Evaluations on two benchmarks show that ChartCynics achieves 74.43% and 64.55% accuracy, providing an absolute performance boost of ~29% over the Qwen3-VL-8B backbone, outperforming state-of-the-art proprietary models. Our results demonstrate that specialized agentic workflows can grant smaller open-source models superior robustness, establishing a new foundation for trustworthy chart interpretation.

Paper Structure

This paper contains 28 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of ChartCynics resolving an "Inverted Axis" deception. (Left) A counter-intuitive real-world case Engel: The inverted Y-axis creates a visual illusion of declining deaths, hijacking standard VLM attention despite the underlying numerical increase. (Middle) Our dual-path architecture decouples perception from extraction. Crucially, pure OCR lacks spatial context, causing entity misalignment (e.g., confusing axis ticks with data points). The Diagnostic Vision Path detects the scale anomaly via ROI cropping and generates an Action Directive to guide the OCR Path in accurate entity mapping. (Right) The Agentic Summarizer resolves cross-modal conflict by combining the complementary information: vision-derived structural baseline and precise OCR numerals to correctly infer a numerical increase.
  • Figure 2: Overview of the ChartCynics architecture. (a) Inference Pipeline: A training-free dual-path workflow that synthesizes visual diagnostics and OCR-extracted data through a structured reasoning chain. (b) Training-Based Optimization: A two-stage pipeline comprising Oracle-Informed SFT for logic distillation and Deception-Aware GRPO for adversarial alignment, utilizing asymmetric reward shaping to penalize visual traps.
  • Figure 3: Vision Path: Diagnostic-Augmented Investigation. The framework decouples perception from reasoning to mitigate confirmation bias. The Diagnostic Agent identifies structural anomalies via high-resolution ROI extraction and generates an Action Directive. Subsequently, the Reasoning Agent anchors its inference to these directives, ensuring that the final conclusion is grounded in structural evidence rather than global visual heuristics.
  • Figure 4: The Agentic Fusion with Detective Chain-of-Thought (D-CoT) process. The Reasoning Agent integrates the Diagnostic Report and OCR Markdown through a five-step detective framework, utilizing Golden Rules and Misleader Taxonomy to ensure adversarial robustness and evidence-based deduction.