ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala; Mehmet Yasin Akpınar; Onur Deniz; Gülşen Eryiğit

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala, Mehmet Yasin Akpınar, Onur Deniz, Gülşen Eryiğit

TL;DR

The paper tackles key information extraction from unstructured financial documents, where multimodal transformers like ViBERTgrid underperform compared to text-only baselines. By integrating a BiLSTM-CRF sequence tagging layer (ViBERTgrid BiLSTM-CRF) and an auxiliary segmentation head, the authors demonstrate improved NER performance on unstructured data while maintaining strengths on semi-structured documents. They validate the approach on unstructured Turkish money transfer orders and the SROIE dataset, and publicly release token-level SROIE annotations to foster multimodal sequence labeling research. The work highlights the practical impact of combining multimodal representations with sequence-labeling techniques for robust KIE in financial workflows, with broader implications for automated document processing.

Abstract

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

TL;DR

Abstract

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Authors

TL;DR

Abstract

Table of Contents

Figures (2)