Table of Contents
Fetching ...

A Fast and Effective Solution to the Problem of Look-ahead Bias in LLMs

Humzah Merchant, Bradford Levy

TL;DR

This work tackles look-ahead bias in finance by proposing inference-time unlearning, a method that guides an LLM’s outputs without retraining, using two small specialized models to forget or retain information. Through Divergence Decoding, the base model’s logits are adjusted via linear or rank-based logit modifications, effectively removing targeted knowledge while preserving general performance. The approach, grounded in Product of Experts and importance sampling, demonstrates strong unlearning performance on the MUSE benchmark and finance tasks (M&A unlearning and debiasing future performance) with substantial efficiency gains, including viable use of trigram models. The method enables reliable evaluation of chronologically sensitive predictions in finance and potentially broader domains where training on future data is undesirable.

Abstract

Applying LLMs to predictive tasks in finance is challenging due to look-ahead bias resulting from their training on long time-series data. This precludes the backtests typically employed in finance since retraining frontier models from scratch with a specific knowledge cutoff is prohibitive. In this paper, we introduce a fast, effective, and low-cost alternative. Our method guides generation at inference time by adjusting the logits of a large base model using a pair of smaller, specialized models -- one fine-tuned on information to be forgotten and another on information to be retained. We demonstrate that our method effectively removes both verbatim and semantic knowledge, corrects biases, and outperforms prior methods.

A Fast and Effective Solution to the Problem of Look-ahead Bias in LLMs

TL;DR

This work tackles look-ahead bias in finance by proposing inference-time unlearning, a method that guides an LLM’s outputs without retraining, using two small specialized models to forget or retain information. Through Divergence Decoding, the base model’s logits are adjusted via linear or rank-based logit modifications, effectively removing targeted knowledge while preserving general performance. The approach, grounded in Product of Experts and importance sampling, demonstrates strong unlearning performance on the MUSE benchmark and finance tasks (M&A unlearning and debiasing future performance) with substantial efficiency gains, including viable use of trigram models. The method enables reliable evaluation of chronologically sensitive predictions in finance and potentially broader domains where training on future data is undesirable.

Abstract

Applying LLMs to predictive tasks in finance is challenging due to look-ahead bias resulting from their training on long time-series data. This precludes the backtests typically employed in finance since retraining frontier models from scratch with a specific knowledge cutoff is prohibitive. In this paper, we introduce a fast, effective, and low-cost alternative. Our method guides generation at inference time by adjusting the logits of a large base model using a pair of smaller, specialized models -- one fine-tuned on information to be forgotten and another on information to be retained. We demonstrate that our method effectively removes both verbatim and semantic knowledge, corrects biases, and outperforms prior methods.

Paper Structure

This paper contains 22 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: MUSE Results: Target is the model to which unlearning is applied. Retrain is the best---but most costly---result of retraining from scratch. Closer to Retrain is better.
  • Figure 2: Performance on finance specific tasks. 99% confidence intervals presented where applicable.
  • Figure 3: All hyper-parameter and model size configurations
  • Figure 4: Analysis of Model Scaling and Over- or Under- Unlearning on MUSE
  • Figure 5: MUSE Scaling and Sustainability. The left column is sustainability - consecutive forget sets of the same size - and the right column is scaling, increasingly large forget sets. We evaluate both utility on the retain set and utility on the original forget set, in order to ensure that we are not losing forget ability, and take the euclidean distance to Retrain with Target as 100%.