TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding
Chin-Chia Michael Yeh, Uday Singh Saini, Xin Dai, Xiran Fan, Shubham Jain, Yujie Fan, Jiarui Sun, Junpeng Wang, Menghai Pan, Yingtong Dou, Yuzhong Chen, Vineeth Rakesh, Liang Wang, Yan Zheng, Mahashweta Das
TL;DR
TREASURE introduces a Transformer-based foundation model tailored to high-volume transaction data, unifying cardholder behavior with payment-network signals. It employs dedicated static/dynamic input handling, a decoder-only Transformer with causal masking, and dual output heads to predict next-transaction attributes and current network signals. The training objective combines an abnormality-driven loss with high-cardinality, efficiently approximated predictions via InfoNCE and shared negatives, enabling scalable learning on billions of records. Empirical results show TREASURE surpasses production baselines in abnormal behavior detection by 111% and boosts downstream merchant recommendations by 104% through high-quality embeddings, with strong evidence of data-driven scaling and robust embedding interpretability. The work suggests TREASURE as a practical, scalable platform for tabular/sequential transaction modeling, with implications for fraud detection, personalization, and future integration with graph- and LLM-based systems.
Abstract
Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.
