Table of Contents
Fetching ...

Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics

Duong Anh Kiet

TL;DR

The paper tackles hierarchical document parsing for visually rich documents in the VRD-IU challenge, focusing on accurate parent–child relation inference under variable layouts. It introduces a unified framework that combines a large-margin, cosine-based matching loss with greedy, rule-driven hierarchical constraints to efficiently assign relationships. Empirically, the approach achieves state-of-the-art performance, including a private-leaderboard accuracy of 0.98904, by marrying discriminative feature learning with structure-aware heuristics. This work demonstrates that integrating deep feature matching with principled rule-based refinements can yield both high accuracy and computational efficiency in complex document understanding tasks.

Abstract

We present our solution to the AAAI-25 VRD-IU challenge, achieving first place in the competition. Our approach integrates large margin loss for improved feature discrimination and employs heuristic rules to refine hierarchical relationships. By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy while maintaining computational efficiency. Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing. Source codes are publicly available at https://github.com/ffyyytt/VRUID-AAAI-DAKiet

Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics

TL;DR

The paper tackles hierarchical document parsing for visually rich documents in the VRD-IU challenge, focusing on accurate parent–child relation inference under variable layouts. It introduces a unified framework that combines a large-margin, cosine-based matching loss with greedy, rule-driven hierarchical constraints to efficiently assign relationships. Empirically, the approach achieves state-of-the-art performance, including a private-leaderboard accuracy of 0.98904, by marrying discriminative feature learning with structure-aware heuristics. This work demonstrates that integrating deep feature matching with principled rule-based refinements can yield both high accuracy and computational efficiency in complex document understanding tasks.

Abstract

We present our solution to the AAAI-25 VRD-IU challenge, achieving first place in the competition. Our approach integrates large margin loss for improved feature discrimination and employs heuristic rules to refine hierarchical relationships. By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy while maintaining computational efficiency. Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing. Source codes are publicly available at https://github.com/ffyyytt/VRUID-AAAI-DAKiet

Paper Structure

This paper contains 10 sections, 3 equations, 1 table.