Table of Contents
Fetching ...

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers

Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, C. Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum

TL;DR

This work proposes a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective that outperforms existing approaches across popular datasets and can be employed for various use-cases.

Abstract

Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers

TL;DR

This work proposes a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective that outperforms existing approaches across popular datasets and can be employed for various use-cases.

Abstract

Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.

Paper Structure

This paper contains 26 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Event Data Pipeline: top: STEP accepts sequences of discrete events (in this case, credit card transactions) as inputs and predicts the next token/label in a sequence. middle: Each event is broken into a short string of tokens, with each feature represented by a distinct token. STEP models each event one feature at a time. Note that while the features of an event occur at the same time, this tokenization scheme adds a causal bias that must be handled by the model. bottom: After tokenization, event sequences are packed for training with an [EOS] token separating independent sequences, just as text is packed for training an autoregressive LLM.
  • Figure 2: STEP Event Processing. STEP is a decoder-only transformer model, which means that each event is passed in sequentially with a causal mask applied.
  • Figure 3: Event Data Preprocessing pipeline
  • Figure 4: Sequence/Label position trade off. AUC score for STEP across various sequence lengths and label position locations for the Amazon Electronics Dataset.
  • Figure 5: STEP with partial masking
  • ...and 2 more figures