GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal
TL;DR
GraphKD introduces a graph-based knowledge distillation framework to efficiently transfer knowledge from large teachers to lightweight students for document object detection. By constructing structured RoI-based graphs with nodes representing instances and edges encoding relations, and by applying adaptive text-bias mitigation and a graph distillation loss that combines node and edge imitation, GraphKD enables heterogeneous distillation (e.g., ViT to CNN) and achieves competitive performance with far fewer parameters. Extensive ablations show the importance of edge structure and non-text node distillation, while comparative studies demonstrate superiority over several KD baselines. The work advances edge-deployable document understanding by preserving structural insights through graph topology, though cross-architecture distillation with transformers remains a challenging direction for future work.
Abstract
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: https://github.com/ayanban011/GraphKD.
