Table of Contents
Fetching ...

On the Effectiveness of Classical Regression Methods for Optimal Switching Problems

Martin Andersson, Benny Avelin, Marcus Olofsson

TL;DR

This work investigates solving high-dimensional optimal switching (OS) problems via regression-based Monte Carlo within the Longstaff-Schwartz framework. It systematically compares classical regression methods (OLS, Ridge, LASSO, Random Forests, LightGBM, k-NN, PCA-kNN) and neural networks, showing that simple methods, particularly PCA-adjusted $k$-NN regression, can achieve near-optimal switching decisions up to $d=50$ with minimal hyperparameter tuning. The authors provide theoretical concentration bounds for the $k$-NN regression in one-step settings under diffusion and jump-diffusion dynamics (sub-Gaussian and sub-exponential tails) and demonstrate empirical robustness across four benchmark OS problems, including high-dimensional cases. Practically, this work suggests practitioners prioritize classical regression methods before resorting to deep learning for OS, leveraging PCA to scale $k$-NN to high dimensions while maintaining strong decision quality and value capture.

Abstract

Simple regression methods provide robust, near-optimal solutions for optimal switching problems, including high-dimensional ones (up to 50). While the theory requires solving intractable PDE systems, the Longstaff-Schwartz algorithm with classical regression methods achieves excellent switching decisions without extensive hyperparameter tuning. Testing linear models (OLS, Ridge, LASSO), tree-based methods (random forests, gradient boosting), $k$-nearest neighbors, and feedforward neural networks on four benchmark problems, we find that several simple methods maintain stable performance across diverse problem characteristics, outperforming the neural networks we tested against. In our comparison, $k$-NN regression performs consistently well, and with minimal hyperparameter tuning. We establish concentration bounds for this regressor and show that PCA enables $k$-NN to scale to high dimensions.

On the Effectiveness of Classical Regression Methods for Optimal Switching Problems

TL;DR

This work investigates solving high-dimensional optimal switching (OS) problems via regression-based Monte Carlo within the Longstaff-Schwartz framework. It systematically compares classical regression methods (OLS, Ridge, LASSO, Random Forests, LightGBM, k-NN, PCA-kNN) and neural networks, showing that simple methods, particularly PCA-adjusted -NN regression, can achieve near-optimal switching decisions up to with minimal hyperparameter tuning. The authors provide theoretical concentration bounds for the -NN regression in one-step settings under diffusion and jump-diffusion dynamics (sub-Gaussian and sub-exponential tails) and demonstrate empirical robustness across four benchmark OS problems, including high-dimensional cases. Practically, this work suggests practitioners prioritize classical regression methods before resorting to deep learning for OS, leveraging PCA to scale -NN to high dimensions while maintaining strong decision quality and value capture.

Abstract

Simple regression methods provide robust, near-optimal solutions for optimal switching problems, including high-dimensional ones (up to 50). While the theory requires solving intractable PDE systems, the Longstaff-Schwartz algorithm with classical regression methods achieves excellent switching decisions without extensive hyperparameter tuning. Testing linear models (OLS, Ridge, LASSO), tree-based methods (random forests, gradient boosting), -nearest neighbors, and feedforward neural networks on four benchmark problems, we find that several simple methods maintain stable performance across diverse problem characteristics, outperforming the neural networks we tested against. In our comparison, -NN regression performs consistently well, and with minimal hyperparameter tuning. We establish concentration bounds for this regressor and show that PCA enables -NN to scale to high dimensions.

Paper Structure

This paper contains 42 sections, 14 theorems, 109 equations, 7 figures, 11 tables, 1 algorithm.

Key Result

Theorem 5.3

Let eq:transition:bounds hold, assume that $0 \leq \widehat{V}_i(t_{n+1},x) \leq L (\|x\|+1)$ for some $L > 0$ and assume that the functions $f_j \geq 0$ and $c_{ij}$ ($c_{ii}=0$) are globally Lipschitz continuous in $x$ (with constant $L$) and satisfy the same growth condition as $\widehat{V}_i(t_{ where $p_{r}^c = \mathbb{P}(X_{t_n} \not \in B(x,r))$, where $B(x,r)$ is the Euclidean ball of radi

Figures (7)

  • Figure 1: Model scaling analysis: (a) training time vs samples, (b) training time vs dimensions, (c) prediction time. Training times reflect the cost of training one model for a single time step and mode; total algorithm runtime follows the complexity formula in \ref{['sec:algorithm']}. Our algorithm in \ref{['sec:algorithm']} required minutes to an hour depending on the specific regression model, hyperparameters, and which experiment we ran.
  • Figure 2: Performance metrics across different strategies for four experiments when starting in state $1$: (a) CL experiment (\ref{['sec:numerical:carmona']}), (b) High-dimensional CL experiment with $d=50$ (\ref{['sec:numerical:carmona_dim']}), (c) ACLP experiment (\ref{['sec:numerical:aid']}), and (d) BSP experiment (\ref{['sec:numerical:stapel']}). Starting values for paths were generated according to the same distributions as in the respective equations of $X_0$ for each experiment. For Internal Consistency and Value Capture, we have taken the mean at time $t_0$ over all paths. Total paths generated for each experiment was 1000. Among the linear models, we have only plotted the best performing one.
  • Figure 3: This plot shows how our three measures scale with dimension in \ref{['sec:numerical:carmona_dim']} when starting in state $1$. Note that for $d>2$, $k$-NN is actually PCA-$k$-NN.
  • Figure 4: Comparison of switching strategies
  • Figure 5: Comparison of switching strategies
  • ...and 2 more figures

Theorems & Definitions (31)

  • Remark 3.1: On the number of regressions
  • Remark 5.1
  • Remark 5.2
  • Theorem 5.3
  • Theorem 5.4
  • Remark 5.5
  • Theorem 5.6
  • Remark 5.7
  • Remark 6.1
  • Lemma C.1
  • ...and 21 more