Structure Diagram Recognition in Financial Announcements
Meixuan Qiao, Jun Wang, Junfu Xiang, Qiyu Hou, Ruixuan Li
TL;DR
This paper tackles structured data extraction from structure diagrams in Chinese financial announcements to support timely knowledge-graph construction. It introduces SDR, an oriented-object detection framework that extends Oriented R-CNN with keypoint-aware line detection and a dedicated bus object to robustly capture complex connections such as inclined, curved, and multi-segment polylines, coupled with post-processing that merges text via OCR to recover topology. To address data scarcity, the authors propose a semi-automated two-stage annotation pipeline: (i) automatic synthesis and annotation of diverse diagrams via Graphviz to train a preliminary model, and (ii) automatic annotation of real-world diagrams with limited manual corrections, facilitated by DOTA↔COCO conversions. The approach is validated on a real-world benchmark derived from Chinese financial announcements, showing SDR substantially outperforming Arrow R-CNN and FR-DETR in detecting lines and recovering structured data tuples like (Owner, Percentage, Owned) and (Supervisor, Subordinate), thus enabling more accurate and scalable knowledge-graph construction from complex diagrams.
Abstract
Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods.
