Table of Contents
Fetching ...

Anagent For Enhancing Scientific Table & Figure Analysis

Xuehang Guo, Zhiyong Lu, Tom Hope, Qingyun Wang

TL;DR

Anagent tackles the challenge of multimodal scientific table & figure analysis by decomposing the problem into four specialized agents (Planner, Expert, Solver, Critic) guided by modular toolkits and a large AnaBench benchmark with $63{,}178$ instances across $9$ domains and $170$ subdisciplines. The framework emphasizes task planning, knowledge retrieval, context-aware reasoning, and reflective refinement to handle long-context multimodal data. Through supervised finetuning and reinforcement learning, along with test-time optimization, Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, across diverse backbone models and analysis complexities. This work highlights the importance of explicit planning, domain-grounded retrieval, and collaborative reasoning for robust scientific analysis, and provides AnaBench as a principled benchmark to guide future multimodal and multi-agent research in scientific domains.

Abstract

In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specific knowledge. However, current artificial intelligence (AI) systems struggle to consistently demonstrate such capabilities. The complexity and variability of scientific tables and figures, combined with heterogeneous structures and long-context requirements, pose fundamental obstacles to scientific table \& figure analysis. To quantify these challenges, we introduce AnaBench, a large-scale benchmark featuring $63,178$ instances from nine scientific domains, systematically categorized along seven complexity dimensions. To tackle these challenges, we propose Anagent, a multi-agent framework for enhanced scientific table \& figure analysis through four specialized agents: Planner decomposes tasks into actionable subtasks, Expert retrieves task-specific information through targeted tool execution, Solver synthesizes information to generate coherent analysis, and Critic performs iterative refinement through five-dimensional quality assessment. We further develop modular training strategies that leverage supervised finetuning and specialized reinforcement learning to optimize individual capabilities while maintaining effective collaboration. Comprehensive evaluation across 170 subdomains demonstrates that Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, while revealing that task-oriented reasoning and context-aware problem-solving are essential for high-quality scientific table \& figure analysis. Our project page: https://xhguo7.github.io/Anagent/.

Anagent For Enhancing Scientific Table & Figure Analysis

TL;DR

Anagent tackles the challenge of multimodal scientific table & figure analysis by decomposing the problem into four specialized agents (Planner, Expert, Solver, Critic) guided by modular toolkits and a large AnaBench benchmark with instances across domains and subdisciplines. The framework emphasizes task planning, knowledge retrieval, context-aware reasoning, and reflective refinement to handle long-context multimodal data. Through supervised finetuning and reinforcement learning, along with test-time optimization, Anagent achieves substantial improvements, up to in training-free settings and with finetuning, across diverse backbone models and analysis complexities. This work highlights the importance of explicit planning, domain-grounded retrieval, and collaborative reasoning for robust scientific analysis, and provides AnaBench as a principled benchmark to guide future multimodal and multi-agent research in scientific domains.

Abstract

In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specific knowledge. However, current artificial intelligence (AI) systems struggle to consistently demonstrate such capabilities. The complexity and variability of scientific tables and figures, combined with heterogeneous structures and long-context requirements, pose fundamental obstacles to scientific table \& figure analysis. To quantify these challenges, we introduce AnaBench, a large-scale benchmark featuring instances from nine scientific domains, systematically categorized along seven complexity dimensions. To tackle these challenges, we propose Anagent, a multi-agent framework for enhanced scientific table \& figure analysis through four specialized agents: Planner decomposes tasks into actionable subtasks, Expert retrieves task-specific information through targeted tool execution, Solver synthesizes information to generate coherent analysis, and Critic performs iterative refinement through five-dimensional quality assessment. We further develop modular training strategies that leverage supervised finetuning and specialized reinforcement learning to optimize individual capabilities while maintaining effective collaboration. Comprehensive evaluation across 170 subdomains demonstrates that Anagent achieves substantial improvements, up to in training-free settings and with finetuning, while revealing that task-oriented reasoning and context-aware problem-solving are essential for high-quality scientific table \& figure analysis. Our project page: https://xhguo7.github.io/Anagent/.
Paper Structure (109 sections, 29 equations, 51 figures, 13 tables, 1 algorithm)

This paper contains 109 sections, 29 equations, 51 figures, 13 tables, 1 algorithm.

Figures (51)

  • Figure 1: Scientific Analysis Workflow. Motivated by how human researchers perform scientific analysis, we decompose the scientific analysis workflow into dedicated stages, which leads to Anagent (Fig. \ref{['fig:anagent']}).
  • Figure 2: Challenges In Scientific Table & Figure Analysis. The heterogeneity of scientific literature presents great challenges for high-quality analysis of scientific tables and figures (Fig. \ref{['fig:preliminary-error-analysis']}).
  • Figure 3: AnaBench For Evaluating Autonomous Scientific Analysis. We implement four-stage benchmark construction method to build AnaBench, with multi-level filtering to enhance data quality.
  • Figure 4: Multi-Agent Coordinative Scientific Analysis. Our multi-Agent scientific analysis framework, Anagent, is developed to cover various stages to analyze scientific tables and figures through four collaborative agents: Planner, Expert, Solver, and Critic. Some example details are omitted as [...] for clarity.
  • Figure 5: Few-Shot Learning Optimization (§\ref{['subsec:anagent:infer']})
  • ...and 46 more figures