Table of Contents
Fetching ...

Memory-Augmented Agent Training for Business Document Understanding

Jiale Liu, Yifan Zeng, Malte Højmark-Bertelsen, Marie Normann Gadeberg, Huazheng Wang, Qingyun Wu

TL;DR

The paper tackles the challenge of domain-adapting LLMs to business document understanding, focusing on extracting transport references from invoices. It introduces Matrix, a memory-augmented agent training framework that uses trajectory-based learning, a Reflector for error-driven feedback, and a memory-updating meta-optimizer to progressively refine domain knowledge. Through real-world collaboration with Kuehne+Nagel and an anonymized open benchmark, Matrix significantly outperforms prompting baselines and vanilla agents, while reducing API calls and enabling processing of longer documents. The work demonstrates a practical pathway for turning general LLMs into specialized, cost-efficient business tools via systematic memory enhancement in document processing tasks.

Abstract

Traditional enterprises face significant challenges in processing business documents, where tasks like extracting transport references from invoices remain largely manual despite their crucial role in logistics operations. While Large Language Models offer potential automation, their direct application to specialized business domains often yields unsatisfactory results. We introduce Matrix (Memory-Augmented agent Training through Reasoning and Iterative eXploration), a novel paradigm that enables LLM agents to progressively build domain expertise through experience-driven memory refinement and iterative learning. To validate this approach, we collaborate with one of the world's largest logistics companies to create a dataset of Universal Business Language format invoice documents, focusing on the task of transport reference extraction. Experiments demonstrate that Matrix outperforms prompting a single LLM by 30.3%, vanilla LLM agent by 35.2%. We further analyze the metrics of the optimized systems and observe that the agent system requires less API calls, fewer costs and can analyze longer documents on average. Our methods establish a new approach to transform general-purpose LLMs into specialized business tools through systematic memory enhancement in document processing tasks.

Memory-Augmented Agent Training for Business Document Understanding

TL;DR

The paper tackles the challenge of domain-adapting LLMs to business document understanding, focusing on extracting transport references from invoices. It introduces Matrix, a memory-augmented agent training framework that uses trajectory-based learning, a Reflector for error-driven feedback, and a memory-updating meta-optimizer to progressively refine domain knowledge. Through real-world collaboration with Kuehne+Nagel and an anonymized open benchmark, Matrix significantly outperforms prompting baselines and vanilla agents, while reducing API calls and enabling processing of longer documents. The work demonstrates a practical pathway for turning general LLMs into specialized, cost-efficient business tools via systematic memory enhancement in document processing tasks.

Abstract

Traditional enterprises face significant challenges in processing business documents, where tasks like extracting transport references from invoices remain largely manual despite their crucial role in logistics operations. While Large Language Models offer potential automation, their direct application to specialized business domains often yields unsatisfactory results. We introduce Matrix (Memory-Augmented agent Training through Reasoning and Iterative eXploration), a novel paradigm that enables LLM agents to progressively build domain expertise through experience-driven memory refinement and iterative learning. To validate this approach, we collaborate with one of the world's largest logistics companies to create a dataset of Universal Business Language format invoice documents, focusing on the task of transport reference extraction. Experiments demonstrate that Matrix outperforms prompting a single LLM by 30.3%, vanilla LLM agent by 35.2%. We further analyze the metrics of the optimized systems and observe that the agent system requires less API calls, fewer costs and can analyze longer documents on average. Our methods establish a new approach to transform general-purpose LLMs into specialized business tools through systematic memory enhancement in document processing tasks.

Paper Structure

This paper contains 24 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The training and inference pipeline of Matrix.
  • Figure 2: Comparison between Matrix and baselines with gpt-4o and gpt-4o-mini as backbone model. Matrix leverages gpt-4o for optimization in both cases. Surprisingly, gpt-4o-mini performs better after optimization.
  • Figure 3: Success rate comparison between agent with gpt-4o and gpt-4o-mini as backbone over epochs.
  • Figure 4: Comparison of average number of API calls it takes to solve a task. The average number decreases steadily as the training goes on.
  • Figure 5: Average cost of API calls after each epoch. The cost shows a decreasing trend as training goes on.
  • ...and 3 more figures