Uni-Parser Technical Report

Xi Fang; Haoyi Tao; Shuwen Yang; Suyang Zhong; Haocheng Lu; Han Lyu; Chaozheng Huang; Xinyu Li; Linfeng Zhang; Guolin Ke

Uni-Parser Technical Report

Xi Fang, Haoyi Tao, Shuwen Yang, Suyang Zhong, Haocheng Lu, Han Lyu, Chaozheng Huang, Xinyu Li, Linfeng Zhang, Guolin Ke

TL;DR

Uni-Parser addresses the challenge of industrial-scale parsing of scientific PDFs and patents by deploying a modular, multi-expert architecture that preserves cross-modal alignments across text, formulas, tables, figures, and chemical structures. It introduces a group-based layout detection framework, modular modalities (OCR, table, formula, chemical, chart), and a distributed, pipeline-parallel infrastructure to achieve high throughput at scale. Key contributions include Uni-Parser-LD for layout, SLANet for table structures, MolParser 1.5 for chemical structure recognition, SciParser for figure captions, and a data-flywheel data engine with Uni-Miner for human-in-the-loop data curation. The system demonstrates scalable performance (billions of pages) and enables downstream AI4Science tasks, including large-scale data generation for foundation models and robust domain-specific knowledge bases.

Abstract

This technical report introduces Uni-Parser, an industrial-grade document parsing engine tailored for scientific literature and patents, delivering high throughput, robust accuracy, and cost efficiency. Unlike pipeline-based document parsing methods, Uni-Parser employs a modular, loosely coupled multi-expert architecture that preserves fine-grained cross-modal alignments across text, equations, tables, figures, and chemical structures, while remaining easily extensible to emerging modalities. The system incorporates adaptive GPU load balancing, distributed inference, dynamic module orchestration, and configurable modes that support either holistic or modality-specific parsing. Optimized for large-scale cloud deployment, Uni-Parser achieves a processing rate of up to 20 PDF pages per second on 8 x NVIDIA RTX 4090D GPUs, enabling cost-efficient inference across billions of pages. This level of scalability facilitates a broad spectrum of downstream applications, ranging from literature retrieval and summarization to the extraction of chemical structures, reaction schemes, and bioactivity data, as well as the curation of large-scale corpora for training next-generation large language models and AI4Science models.

Uni-Parser Technical Report

TL;DR

Abstract

Uni-Parser Technical Report

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)