FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
Amit Agarwal, Srikant Panda, Kulbhushan Pachauri
TL;DR
The paper tackles visually rich document understanding (VRDU) under data scarcity by introducing FS-DAG, a modular few-shot graph-based framework that integrates domain-specific textual and visual backbones with a graph neural network for Key Information Extraction. It jointly fuses textual and visual features via Kronecker fusion, applies shared position embeddings and multi-head attention in the GNN, and uses carefully designed training strategies to achieve robust few-shot adaptation with under 90M parameters. Extensive experiments on two industry-focused VRDU datasets show FS-DAG achieving state-of-the-art or competitive F1 scores while reducing model size and latency compared to LayoutLMv2/v3 and other graph models, and demonstrating strong robustness to OCR errors. The work also provides ablation evidence that each architectural and training component contributes to performance, supports deployment by industry with 50+ customers and 1M+ API calls monthly, and points toward zero-shot extensions as future work.
Abstract
In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag
