Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Adir Saly-Kaufmann; Kieran Wood; Jan Peter-Calliess; Stefan Zohren

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Adir Saly-Kaufmann, Kieran Wood, Jan Peter-Calliess, Stefan Zohren

Abstract

We present a large scale benchmark of modern deep learning architectures for a financial time series prediction and position sizing task, with a primary focus on Sharpe ratio optimization. Evaluating linear models, recurrent networks, transformer based architectures, state space models, and recent sequence representation approaches, we assess out of sample performance on a daily futures dataset spanning commodities, equity indices, bonds, and FX spanning 2010 to 2025. Our evaluation goes beyond average returns and includes statistical significance, downside and tail risk measures, breakeven transaction cost analysis, robustness to random seed selection, and computational efficiency. We find that models explicitly designed to learn rich temporal representations consistently outperform linear benchmarks and generic deep learning models, which often lead the ranking in standard time series benchmarks. Hybrid models such as VSN with LSTM, a combination of Variable Selection Networks (VSN) and LSTMs, achieves the highest overall Sharpe ratio, while VSN with xLSTM and LSTM with PatchTST exhibit superior downside adjusted characteristics. xLSTM demonstrates the largest breakeven transaction cost buffer, indicating improved robustness to trading frictions.

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Abstract

Paper Structure (98 sections, 49 equations, 27 figures, 11 tables)

This paper contains 98 sections, 49 equations, 27 figures, 11 tables.

Introduction
Contributions.
Architectures
Problem Setup
Trading Signal Generation
Portfolio Construction
End-to-end Optimization
Net Returns and Breakeven Transaction Costs
Linear Baselines
Autoregressive Model (AR1x).
DLinear and NLinear.
Transformer-Based Architectures Without Explicit Recurrence
iTransformer.
PatchTST.
State-Space and Implicitly Recurrent Models
...and 83 more sections

Figures (27)

Figure 1: End-to-end portfolio optimization pipeline: Statistical and technical indicators are extracted from historical close prices, serving as the predictive model's inputs. The model outputs are transformed into portfolio weights via a linear projection followed by a hyperbolic tangent activation. Training is performed by minimizing the negative Sharpe Ratio.
Figure 2: Performance comparison across models 10% volatility-rescaled gross PnL.
Figure 3: Distribution of daily returns. To make the central mass visible, the figure focuses on the bulk of the distribution; tail behavior is examined separately.
Figure 4: Distribution of daily realized volatility (log scale). Volatility exhibits strong right skewness and a long upper tail.
Figure 5: Left: Quantile--quantile plot against the normal distribution. Right: Tail behavior of daily returns. The figures indicate substantial deviations from Gaussianity and heavy-tailed return dynamics.
...and 22 more figures

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Abstract

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Authors

Abstract

Table of Contents

Figures (27)