Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Jiawei Zhou; Chi Zhang; Xiang Feng; Qiming Zhang; Haibo Qiu; Lihuo He; Dengpan Ye; Xinbo Gao; Jing Zhang

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Jiawei Zhou, Chi Zhang, Xiang Feng, Qiming Zhang, Haibo Qiu, Lihuo He, Dengpan Ye, Xinbo Gao, Jing Zhang

Abstract

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Abstract

Paper Structure (56 sections, 3 equations, 26 figures, 11 tables)

This paper contains 56 sections, 3 equations, 26 figures, 11 tables.

Introduction
Related Work
Multimodal Code Benchmarks
Visual Perception in LMMs
The Omni-I2C Benchmark
Task Definition
Benchmark Coverage Analysis
Data Curation Process
Evaluation Metrics
Experiment
Baseline Setup
Main Results
Discussion
Study on Evaluation Metrics
Different Prompting Methods
...and 41 more sections

Figures (26)

Figure 1: Taxonomy overview of the Omni-I2C benchmark. The dataset features a diverse distribution across 5 code types, 8 major subjects, and 45 distinct figure types.
Figure 2: The Data Curation and pipeline of Omni-I2C. Data Curation (Left): From raw web collection to a refined benchmark with diverse themes and languages, ensured by strict filtering and human verification. Pipeline (Right): The proposed workflow for image-to-code generation and automatic evaluation.
Figure 3: Representative examples from Gemini 3 Pro.
Figure 4: Comparative Analysis of Error Patterns in Claude Sonnet 4.5, Gemini 3 Pro, and Qwen3-VL-235B-A22B-Instruct. Case studies and evaluations across various languages are detailed in the App. \ref{['sec:appendix_err']}.
Figure 5: Examples of LLM-based code restructuring with foreground and numerical perturbations.
...and 21 more figures

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Abstract

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

Authors

Abstract

Table of Contents

Figures (26)