Table of Contents
Fetching ...

Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data

Haitham Hammami, Louis Baligand, Bojan Petrovski

TL;DR

This work tackles address parsing in financial payments, where messages often contain free-text addresses that must be structured for regulatory purposes. It benchmarks LibPostal, DeepParse, Transformer-based models, and decoder-based LLMs on a richly augmented, noisy synthetic dataset that mimics production SWIFT messages, and introduces three data-augmentation variants to reflect real-world variability. The study finds that well-tuned Transformer models, particularly XLM-RoBERTa-Large with appropriate training strategies, achieve state-of-the-art $F1$ performance on both synthetic zero-shot and production-like data, while decoding-based LLMs offer promising but less reliable results due to hallucinations and formatting issues. The authors release an open-source augmented dataset and a fine-tuned model to spur further research and practical deployment, with future work aimed at incorporating context-free grammars and geography knowledge to further constrain outputs.

Abstract

In the financial industry, identifying the location of parties involved in payments is a major challenge in the context of various regulatory requirements. For this purpose address parsing entails extracting fields such as street, postal code, or country from free text message attributes. While payment processing platforms are updating their standards with more structured formats such as SWIFT with ISO 20022, address parsing remains essential for a considerable volume of messages. With the emergence of Transformers and Generative Large Language Models (LLM), we explore the performance of state-of-the-art solutions given the constraint of processing a vast amount of daily data. This paper also aims to show the need for training robust models capable of dealing with real-world noisy transactional data. Our results suggest that a well fine-tuned Transformer model using early-stopping significantly outperforms other approaches. Nevertheless, generative LLMs demonstrate strong zero-shot performance and warrant further investigations.

Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data

TL;DR

This work tackles address parsing in financial payments, where messages often contain free-text addresses that must be structured for regulatory purposes. It benchmarks LibPostal, DeepParse, Transformer-based models, and decoder-based LLMs on a richly augmented, noisy synthetic dataset that mimics production SWIFT messages, and introduces three data-augmentation variants to reflect real-world variability. The study finds that well-tuned Transformer models, particularly XLM-RoBERTa-Large with appropriate training strategies, achieve state-of-the-art performance on both synthetic zero-shot and production-like data, while decoding-based LLMs offer promising but less reliable results due to hallucinations and formatting issues. The authors release an open-source augmented dataset and a fine-tuned model to spur further research and practical deployment, with future work aimed at incorporating context-free grammars and geography knowledge to further constrain outputs.

Abstract

In the financial industry, identifying the location of parties involved in payments is a major challenge in the context of various regulatory requirements. For this purpose address parsing entails extracting fields such as street, postal code, or country from free text message attributes. While payment processing platforms are updating their standards with more structured formats such as SWIFT with ISO 20022, address parsing remains essential for a considerable volume of messages. With the emergence of Transformers and Generative Large Language Models (LLM), we explore the performance of state-of-the-art solutions given the constraint of processing a vast amount of daily data. This paper also aims to show the need for training robust models capable of dealing with real-world noisy transactional data. Our results suggest that a well fine-tuned Transformer model using early-stopping significantly outperforms other approaches. Nevertheless, generative LLMs demonstrate strong zero-shot performance and warrant further investigations.
Paper Structure (18 sections, 2 figures, 8 tables)

This paper contains 18 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Learning curve showing the model's overfit
  • Figure 2: Prompt Template used for LLMs