Table of Contents
Fetching ...

Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach

Qianxin Yi, Shao-Bo Lin, Jun Fan, Yao Wang

TL;DR

This work tackles the tension between interpretability and performance in reinforcement learning by recasting batch Q-learning as a multi-stage linear regression and introducing an adaptive spectral based linear Q-learning framework. The core method uses a spectral filter $g_{\lambda_t}$ applied to the empirical covariance and an adaptive regularization schedule $\lambda_t$ guided by a bias–variance trade-off, yielding interpretable linear policies with competitive accuracy. Theoretical results establish near-optimal parameter estimation and generalization bounds under geometrically $\tau$-mixing data and effective-dimension conditions, while comprehensive experiments on synthetic data and real-world Kuaishou and Taobao datasets demonstrate improved decision quality and transparent policy explanations. The approach also provides practical insights for feature selection and model simplification, reinforcing its potential for scalable, trustworthy management decisions in sequential decision problems.

Abstract

Reinforcement learning (RL) has been widely applied to sequential decision making, where interpretability and performance are both critical for practical adoption. Current approaches typically focus on performance and rely on post hoc explanations to account for interpretability. Different from these approaches, we focus on designing an interpretability-oriented yet performance-enhanced RL approach. Specifically, we propose a spectral based linear RL method that extends the ridge regression-based approach through a spectral filter function. The proposed method clarifies the role of regularization in controlling estimation error and further enables the design of an adaptive regularization parameter selection strategy guided by the bias-variance trade-off principle. Theoretical analysis establishes near-optimal bounds for both parameter estimation and generalization error. Extensive experiments on simulated environments and real-world datasets from Kuaishou and Taobao demonstrate that our method either outperforms or matches existing baselines in decision quality. We also conduct interpretability analyses to illustrate how the learned policies make decisions, thereby enhancing user trust. These results highlight the potential of our approach to bridge the gap between RL theory and practical decision making, providing interpretability, accuracy, and adaptability in management contexts.

Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach

TL;DR

This work tackles the tension between interpretability and performance in reinforcement learning by recasting batch Q-learning as a multi-stage linear regression and introducing an adaptive spectral based linear Q-learning framework. The core method uses a spectral filter applied to the empirical covariance and an adaptive regularization schedule guided by a bias–variance trade-off, yielding interpretable linear policies with competitive accuracy. Theoretical results establish near-optimal parameter estimation and generalization bounds under geometrically -mixing data and effective-dimension conditions, while comprehensive experiments on synthetic data and real-world Kuaishou and Taobao datasets demonstrate improved decision quality and transparent policy explanations. The approach also provides practical insights for feature selection and model simplification, reinforcing its potential for scalable, trustworthy management decisions in sequential decision problems.

Abstract

Reinforcement learning (RL) has been widely applied to sequential decision making, where interpretability and performance are both critical for practical adoption. Current approaches typically focus on performance and rely on post hoc explanations to account for interpretability. Different from these approaches, we focus on designing an interpretability-oriented yet performance-enhanced RL approach. Specifically, we propose a spectral based linear RL method that extends the ridge regression-based approach through a spectral filter function. The proposed method clarifies the role of regularization in controlling estimation error and further enables the design of an adaptive regularization parameter selection strategy guided by the bias-variance trade-off principle. Theoretical analysis establishes near-optimal bounds for both parameter estimation and generalization error. Extensive experiments on simulated environments and real-world datasets from Kuaishou and Taobao demonstrate that our method either outperforms or matches existing baselines in decision quality. We also conduct interpretability analyses to illustrate how the learned policies make decisions, thereby enhancing user trust. These results highlight the potential of our approach to bridge the gap between RL theory and practical decision making, providing interpretability, accuracy, and adaptability in management contexts.

Paper Structure

This paper contains 35 sections, 26 theorems, 177 equations, 11 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $0 \leq \delta \leq 1/2$ satisfy $\delta \geq 2 \exp \left\{-\frac{\sqrt{2 r+s}}{(\log d)^{\frac{1}{\gamma_0}}\sqrt{\log_q(|D|_\gamma^{-1/2})}}|D|_\gamma^{\frac{r}{4 r+2 s+1}} \right\}$. Under Assumptions assump:mixing-assump:conditional probability, with $r \geq 0$ and $0\leq s \leq 1$, if $\la where $C(T,\mu)=\sum_{t=1}^T \mu^{\frac{t}{2}} C_{c}\sum_{\ell=t}^T\left((T-\ell+2)M+ M\prod_{k=\

Figures (11)

  • Figure 1: Motivation behind this work
  • Figure 2: Interpretability and performance trade-off
  • Figure 3: Parameter gap and policy gap on simulation data
  • Figure 4: Visualization of feature weights across different time steps on synthetic data
  • Figure 5: Cumulative reward comparison
  • ...and 6 more figures

Theorems & Definitions (29)

  • Definition 1
  • Definition 2: $\tau$-mixing, maume2006exponential
  • Theorem 1
  • Remark 1
  • Lemma 1: Corollary 3.7 in blanchard2019concentration
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 19 more