DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Yutong Yan; Raphael Tang; Zhenyu Gao; Wenxi Jiang; Yao Lu

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Yutong Yan, Raphael Tang, Zhenyu Gao, Wenxi Jiang, Yao Lu

Abstract

In financial backtesting, large language models pretrained on internet-scale data risk introducing lookahead bias that undermines their forecasting validity, as they may have already seen the true outcome during training. To address this, we present DatedGPT, a family of twelve 1.3B-parameter language models, each trained from scratch on approximately 100 billion tokens of temporally partitioned data with strict annual cutoffs spanning 2013 to 2024. We further enhance each model with instruction fine-tuning on both general-domain and finance-specific datasets curated to respect the same temporal boundaries. Perplexity-based probing confirms that each model's knowledge is effectively bounded by its data cutoff year, while evaluation on standard benchmarks shows competitive performance with existing models of similar scale. We provide an interactive web demo that allows users to query and compare responses from models across different cutoff years.

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Abstract

Paper Structure (25 sections, 4 figures, 6 tables)

This paper contains 25 sections, 4 figures, 6 tables.

Introduction
Time-Aware Dataset Curation
Pretraining Data with Cutoff Dates
Instruction-Following Data with Cutoff Dates
General-domain instruction dataset curation
Time-sensitive instruction dataset curation
Time-Aware Model Training
Model architecture and hyperparameters
Pretraining Setup
Instruction-Tuning Setup
Experiment Results
Evaluation setup
Baseline models
Our models
Evaluation datasets
...and 10 more sections

Figures (4)

Figure 1: Web interface for DatedGPT. When queried about OpenAI's chatbot, the DatedGPT-2020 model, trained exclusively on data available before 2020, is unaware of ChatGPT.
Figure 2: Average relative perplexity of DatedGPT-base-2020 evaluated on quarterly public company news headlines from 2013 to 2024.
Figure 3: Training loss curves for 2013 and 2024 DatedGPT-base models. All models exhibit smooth convergence without loss spikes or instability.
Figure 4: Relative perplexity of DatedGPT-base-2017 evaluated on quarterly public company news head-lines from 2013 to 2024.

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Abstract

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

Authors

Abstract

Table of Contents

Figures (4)