Docling Technical Report
Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Nikolaos Livathinos, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Fabian Lindlbauer, Kasper Dinkla, Lokesh Mishra, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, Peter W. J. Staar
TL;DR
DocLing presents an open-source, end-to-end PDF-to-structured-output pipeline that runs locally on commodity hardware, integrating DocLayNet for layout analysis and TableFormer for table structure recognition. It offers a modular processing pipeline with multiple PDF backends, pretrained AI models, and extensible model pipelines, delivering JSON or Markdown outputs along with rich metadata. The work demonstrates favorable performance on CPU with options for batching, highlights trade-offs with alternative backends, and positions DocLing as a foundation for downstream AI tasks such as RAG and knowledge extraction, with integration into enterprise data tooling. The authors also outline future extensions and community-focused contributions under the MIT license.
Abstract
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.
