Table of Contents
Fetching ...

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

Minjun Zhu, Zhen Lin, Yixuan Weng, Panzhong Lu, Qiujie Xie, Yifan Wei, Sifan Liu, Qiyao Sun, Yue Zhang

TL;DR

This work tackles the bottleneck of producing publication-ready scientific illustrations from long-context documents by introducing FigureBench, a large-scale benchmark of 3,300 long-text–illustration pairs, and AutoFigure, a two-stage, agentic framework based on the Reasoned Rendering paradigm. Stage I performs conceptual grounding and layout planning to produce a symbolic, structurally coherent blueprint, while Stage II renders this blueprint into high-fidelity visuals with an erase-and-correct post-processing pipeline to ensure textual accuracy. Through extensive automated and human evaluations, AutoFigure consistently surpasses baselines in visual design, communication, and content fidelity, achieving publication-ready quality across diverse document types. The work demonstrates a practical path toward automated, high-quality scientific visualization, with implications for broader AI-assisted scientific communication and future extensions to dynamic and interactive figures.

Abstract

High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text. Specifically, before rendering the final result, AutoFigure engages in extensive thinking, recombination, and validation to produce a layout that is both structurally sound and aesthetically refined, outputting a scientific illustration that achieves both structural completeness and aesthetic appeal. Leveraging the high-quality data from FigureBench, we conduct extensive experiments to test the performance of AutoFigure against various baseline methods. The results demonstrate that AutoFigure consistently surpasses all baseline methods, producing publication-ready scientific illustrations. The code, dataset and huggingface space are released in https://github.com/ResearAI/AutoFigure.

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

TL;DR

This work tackles the bottleneck of producing publication-ready scientific illustrations from long-context documents by introducing FigureBench, a large-scale benchmark of 3,300 long-text–illustration pairs, and AutoFigure, a two-stage, agentic framework based on the Reasoned Rendering paradigm. Stage I performs conceptual grounding and layout planning to produce a symbolic, structurally coherent blueprint, while Stage II renders this blueprint into high-fidelity visuals with an erase-and-correct post-processing pipeline to ensure textual accuracy. Through extensive automated and human evaluations, AutoFigure consistently surpasses baselines in visual design, communication, and content fidelity, achieving publication-ready quality across diverse document types. The work demonstrates a practical path toward automated, high-quality scientific visualization, with implications for broader AI-assisted scientific communication and future extensions to dynamic and interactive figures.

Abstract

High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text. Specifically, before rendering the final result, AutoFigure engages in extensive thinking, recombination, and validation to produce a layout that is both structurally sound and aesthetically refined, outputting a scientific illustration that achieves both structural completeness and aesthetic appeal. Leveraging the high-quality data from FigureBench, we conduct extensive experiments to test the performance of AutoFigure against various baseline methods. The results demonstrate that AutoFigure consistently surpasses all baseline methods, producing publication-ready scientific illustrations. The code, dataset and huggingface space are released in https://github.com/ResearAI/AutoFigure.
Paper Structure (33 sections, 1 equation, 21 figures, 12 tables)

This paper contains 33 sections, 1 equation, 21 figures, 12 tables.

Figures (21)

  • Figure 1: The composition of the FigureBench dataset. It features a rich collection of text-figure pairs from four distinct sources (Paper, Survey, Blog, and TextBook), demonstrating the benchmark's capability to evaluate automatic illustration generation across various domains and complexities.
  • Figure 2: An Overview of the AutoFigure, which decouples structural layout generation from aesthetic rendering. Stage 1 ensures structural fidelity by having a multi-agent system generate and iteratively self-correct a symbolic layout (SVG). Stage 2 renders the validated layout and employs an erase-and-correct module—using OCR and cross-verification—to guarantee perfect textual accuracy with high-fidelity vector overlays. This figure is also produced by AutoFigure and serves as a qualitative showcase of its generation quality.
  • Figure 3: Examples showcasing the versatility of AutoFigure in generating complex scientific illustrations from a diverse range of academic texts. Note that we employ a unified default style (Delicate and cute cartoon comic style, using Morandi color palette) solely to ensure visual consistency and readability for comparative analysis. This is a choice of presentation rather than a limitation of the method; users can freely specify or mix arbitrary styles as needed (see in Appendix \ref{['app:style']}). We present a diverse range of results in Appendix \ref{['appendix:cases']} to further illustrate our approach.
  • Figure 4: Human evaluation results from 10 first-author experts assessing AI-generated figures for 21 of their own publications. The comprehensive study required experts to perform three tasks: (a) a forced-choice holistic ranking of six AI models against the original reference to determine a win rate, (b) a publication intent selection, and (c-e) multi-dimensional scoring on a 1-5 Likert scale for accuracy, clarity, and aesthetics.
  • Figure 5: Ablation studies of the AutoFigure framework. Subplots compare different backbone models on (a) pre-rendering symbolic layouts versus (b) final rendered outputs. Also shown are (c) performance scaling with increased test-time refinement iterations and (d) the impact of different intermediate sketch formats.
  • ...and 16 more figures