Anagent For Enhancing Scientific Table & Figure Analysis
Xuehang Guo, Zhiyong Lu, Tom Hope, Qingyun Wang
TL;DR
Anagent tackles the challenge of multimodal scientific table & figure analysis by decomposing the problem into four specialized agents (Planner, Expert, Solver, Critic) guided by modular toolkits and a large AnaBench benchmark with $63{,}178$ instances across $9$ domains and $170$ subdisciplines. The framework emphasizes task planning, knowledge retrieval, context-aware reasoning, and reflective refinement to handle long-context multimodal data. Through supervised finetuning and reinforcement learning, along with test-time optimization, Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, across diverse backbone models and analysis complexities. This work highlights the importance of explicit planning, domain-grounded retrieval, and collaborative reasoning for robust scientific analysis, and provides AnaBench as a principled benchmark to guide future multimodal and multi-agent research in scientific domains.
Abstract
In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specific knowledge. However, current artificial intelligence (AI) systems struggle to consistently demonstrate such capabilities. The complexity and variability of scientific tables and figures, combined with heterogeneous structures and long-context requirements, pose fundamental obstacles to scientific table \& figure analysis. To quantify these challenges, we introduce AnaBench, a large-scale benchmark featuring $63,178$ instances from nine scientific domains, systematically categorized along seven complexity dimensions. To tackle these challenges, we propose Anagent, a multi-agent framework for enhanced scientific table \& figure analysis through four specialized agents: Planner decomposes tasks into actionable subtasks, Expert retrieves task-specific information through targeted tool execution, Solver synthesizes information to generate coherent analysis, and Critic performs iterative refinement through five-dimensional quality assessment. We further develop modular training strategies that leverage supervised finetuning and specialized reinforcement learning to optimize individual capabilities while maintaining effective collaboration. Comprehensive evaluation across 170 subdomains demonstrates that Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, while revealing that task-oriented reasoning and context-aware problem-solving are essential for high-quality scientific table \& figure analysis. Our project page: https://xhguo7.github.io/Anagent/.
