Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics
Duong Anh Kiet
TL;DR
The paper tackles hierarchical document parsing for visually rich documents in the VRD-IU challenge, focusing on accurate parent–child relation inference under variable layouts. It introduces a unified framework that combines a large-margin, cosine-based matching loss with greedy, rule-driven hierarchical constraints to efficiently assign relationships. Empirically, the approach achieves state-of-the-art performance, including a private-leaderboard accuracy of 0.98904, by marrying discriminative feature learning with structure-aware heuristics. This work demonstrates that integrating deep feature matching with principled rule-based refinements can yield both high accuracy and computational efficiency in complex document understanding tasks.
Abstract
We present our solution to the AAAI-25 VRD-IU challenge, achieving first place in the competition. Our approach integrates large margin loss for improved feature discrimination and employs heuristic rules to refine hierarchical relationships. By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy while maintaining computational efficiency. Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing. Source codes are publicly available at https://github.com/ffyyytt/VRUID-AAAI-DAKiet
