Table of Contents
Fetching ...

Attention Factors for Statistical Arbitrage

Elliot L. Epstein, Rose Wang, Jaewon Choi, Markus Pelger

TL;DR

The paper tackles statistical arbitrage by proposing an end-to-end Attention Factor Model that jointly learns tradable factors and arbitrage portfolio allocations from firm-characteristic embeddings. It combines conditional latent factors with a LongConv sequence model to extract time-series mispricing signals from residuals, optimizing performance after transaction costs. Empirically, the approach yields an out-of-sample annualized Sharpe ratio above 4 without frictions and about 2.3 after costs on 24 years of U.S. equity data, outperforming PCA-based and OU-threshold benchmarks. The findings highlight the importance of weak factors and show that end-to-end optimization with cost-aware objectives significantly improves profitability and interpretability through industry-aligned factor structure.

Abstract

Statistical arbitrage exploits temporal price differences between similar assets. We develop a framework to jointly identify similar assets through factors, identify mispricing and form a trading policy that maximizes risk-adjusted performance after trading costs. Our Attention Factors are conditional latent factors that are the most useful for arbitrage trading. They are learned from firm characteristic embeddings that allow for complex interactions. We identify time-series signals from the residual portfolios of our factors with a general sequence model. Estimating factors and the arbitrage trading strategy jointly is crucial to maximize profitability after trading costs. In a comprehensive empirical study we show that our Attention Factor model achieves an out-of-sample Sharpe ratio above 4 on the largest U.S. equities over a 24-year period. Our one-step solution yields an unprecedented Sharpe ratio of 2.3 net of transaction costs. We show that weak factors are important for arbitrage trading.

Attention Factors for Statistical Arbitrage

TL;DR

The paper tackles statistical arbitrage by proposing an end-to-end Attention Factor Model that jointly learns tradable factors and arbitrage portfolio allocations from firm-characteristic embeddings. It combines conditional latent factors with a LongConv sequence model to extract time-series mispricing signals from residuals, optimizing performance after transaction costs. Empirically, the approach yields an out-of-sample annualized Sharpe ratio above 4 without frictions and about 2.3 after costs on 24 years of U.S. equity data, outperforming PCA-based and OU-threshold benchmarks. The findings highlight the importance of weak factors and show that end-to-end optimization with cost-aware objectives significantly improves profitability and interpretability through industry-aligned factor structure.

Abstract

Statistical arbitrage exploits temporal price differences between similar assets. We develop a framework to jointly identify similar assets through factors, identify mispricing and form a trading policy that maximizes risk-adjusted performance after trading costs. Our Attention Factors are conditional latent factors that are the most useful for arbitrage trading. They are learned from firm characteristic embeddings that allow for complex interactions. We identify time-series signals from the residual portfolios of our factors with a general sequence model. Estimating factors and the arbitrage trading strategy jointly is crucial to maximize profitability after trading costs. In a comprehensive empirical study we show that our Attention Factor model achieves an out-of-sample Sharpe ratio above 4 on the largest U.S. equities over a 24-year period. Our one-step solution yields an unprecedented Sharpe ratio of 2.3 net of transaction costs. We show that weak factors are important for arbitrage trading.

Paper Structure

This paper contains 42 sections, 17 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Conceptual Attention Factor Model
  • Figure 2: Cumulative Returns of Arbitrage Portfolios
  • Figure 3: Interpretation of Attention Factor Betas
  • Figure 4: Attention Weights for Factor Portfolio Weights
  • Figure : The figure illustrates the conceptual structure of the Attention Factor model. Left: Attention factors are constructed by computing scaled inner products between embedded characteristics for each asset and the $K$ query vectors $Q_k$. Right: The statistical arbitrage methodology. First, for each asset, a replicating portfolio based on the attention factors is created, giving a residual mispricing. Second, a series of lagged residuals are used to construct the portfolio weights in the residual space, using a Long Convolution model for sequence modeling. Finally, the portfolio weights are mapped back to the asset space via a composition matrix, giving the next-period portfolio return.
  • ...and 3 more figures