Table of Contents
Fetching ...

FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management

Jian'an Zhang

TL;DR

FR-LUX tackles the problem of delivering robust, implementable portfolio policies under realistic frictions and regime shifts. It introduces a friction-aware, regime-conditioned reinforcement learning framework that optimizes after-cost rewards by embedding a microstructure-consistent execution model, constraining turnover with a trade-flow trust region, and conditioning policy behavior on regime states. Theoretical guarantees include monotone improvement under a KL-trust region, turnover bounds, inaction bands due to proportional costs, and robustness to cost misspecification, complemented by regime-conditioning advantages. Empirically, FR-LUX achieves top after-cost Sharpe across a 20-regime by 5-cost grid, maintains a flatter cost–performance slope, and exhibits positive, statistically significant improvements over strong baselines while preserving implementability through regime-aware cost calibration and scenario-level inference. The work demonstrates that integrating execution in the learning loop yields credible, scalable gains for live portfolio management with explicit consideration of liquidity and regime dynamics.

Abstract

Transaction costs and regime shifts are major reasons why paper portfolios fail in live trading. We introduce FR-LUX (Friction-aware, Regime-conditioned Learning under eXecution costs), a reinforcement learning framework that learns after-cost trading policies and remains robust across volatility-liquidity regimes. FR-LUX integrates three ingredients: (i) a microstructure-consistent execution model combining proportional and impact costs, directly embedded in the reward; (ii) a trade-space trust region that constrains changes in inventory flow rather than logits, yielding stable low-turnover updates; and (iii) explicit regime conditioning so the policy specializes to LL/LH/HL/HH states without fragmenting the data. On a 4 x 5 grid of regimes and cost levels with multiple random seeds, FR-LUX achieves the top average Sharpe ratio with narrow bootstrap confidence intervals, maintains a flatter cost-performance slope than strong baselines, and attains superior risk-return efficiency for a given turnover budget. Pairwise scenario-level improvements are strictly positive and remain statistically significant after multiple-testing corrections. We provide formal guarantees on optimality under convex frictions, monotonic improvement under a KL trust region, long-run turnover bounds and induced inaction bands due to proportional costs, positive value advantage for regime-conditioned policies, and robustness to cost misspecification. The methodology is implementable: costs are calibrated from standard liquidity proxies, scenario-level inference avoids pseudo-replication, and all figures and tables are reproducible from released artifacts.

FR-LUX: Friction-Aware, Regime-Conditioned Policy Optimization for Implementable Portfolio Management

TL;DR

FR-LUX tackles the problem of delivering robust, implementable portfolio policies under realistic frictions and regime shifts. It introduces a friction-aware, regime-conditioned reinforcement learning framework that optimizes after-cost rewards by embedding a microstructure-consistent execution model, constraining turnover with a trade-flow trust region, and conditioning policy behavior on regime states. Theoretical guarantees include monotone improvement under a KL-trust region, turnover bounds, inaction bands due to proportional costs, and robustness to cost misspecification, complemented by regime-conditioning advantages. Empirically, FR-LUX achieves top after-cost Sharpe across a 20-regime by 5-cost grid, maintains a flatter cost–performance slope, and exhibits positive, statistically significant improvements over strong baselines while preserving implementability through regime-aware cost calibration and scenario-level inference. The work demonstrates that integrating execution in the learning loop yields credible, scalable gains for live portfolio management with explicit consideration of liquidity and regime dynamics.

Abstract

Transaction costs and regime shifts are major reasons why paper portfolios fail in live trading. We introduce FR-LUX (Friction-aware, Regime-conditioned Learning under eXecution costs), a reinforcement learning framework that learns after-cost trading policies and remains robust across volatility-liquidity regimes. FR-LUX integrates three ingredients: (i) a microstructure-consistent execution model combining proportional and impact costs, directly embedded in the reward; (ii) a trade-space trust region that constrains changes in inventory flow rather than logits, yielding stable low-turnover updates; and (iii) explicit regime conditioning so the policy specializes to LL/LH/HL/HH states without fragmenting the data. On a 4 x 5 grid of regimes and cost levels with multiple random seeds, FR-LUX achieves the top average Sharpe ratio with narrow bootstrap confidence intervals, maintains a flatter cost-performance slope than strong baselines, and attains superior risk-return efficiency for a given turnover budget. Pairwise scenario-level improvements are strictly positive and remain statistically significant after multiple-testing corrections. We provide formal guarantees on optimality under convex frictions, monotonic improvement under a KL trust region, long-run turnover bounds and induced inaction bands due to proportional costs, positive value advantage for regime-conditioned policies, and robustness to cost misspecification. The methodology is implementable: costs are calibrated from standard liquidity proxies, scenario-level inference avoids pseudo-replication, and all figures and tables are reproducible from released artifacts.

Paper Structure

This paper contains 68 sections, 13 theorems, 50 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

Under Assumptions ass:mdp–ass:policy, the discounted control problem with after‑cost rewards admits an optimal stationary Markov policy $\pi^\star$. Moreover, there exists a deterministic selector $\pi^\star(s,z)\in\arg\max_{a\in\mathcal{W}}Q^{\pi^\star}(s,z,a)$.

Figures (4)

  • Figure 1: Top methods by Sharpe (95% bootstrap CI). Bars show scenario-mean Sharpe with seeds averaged first; whiskers are percentile CIs. All statistics are computed on after-cost returns.
  • Figure 2: Cost robustness. Scenario-mean Sharpe versus cost (bps). Shaded bands are $\pm 1$ standard error across regimes (HAC). The slope for FR‑LUX is the smallest among competitors, evidencing friction-aware learning.
  • Figure 3: Regime profile (mean Sharpe). The color scale is centered at zero, making positive vs. negative cells directly comparable. FR‑LUX attains consistently positive Sharpe across all volatility--liquidity regimes.
  • Figure 4: Per-scenario pairwise Sharpe differences ($\Delta S$). Each box summarizes the distribution of $\Delta S$ across the 20 scenarios (regime $\times$ cost), with seeds averaged within scenario. A horizontal zero line aids interpretation; stars (reported in the replication tables) indicate sign-test significance after Romano--Wolf stepdown. Takeaway: FR-LUX exhibits strictly positive and precisely estimated improvements over both PPO and turnover-capped mean--variance.

Theorems & Definitions (26)

  • Theorem 1: Existence of an optimal stationary policy
  • Lemma 1: Performance‑difference with frictions
  • Theorem 2: Monotonic improvement under a KL trust region
  • Corollary 1: Clipped PPO with trade‑space penalty
  • Proposition 1: Long‑run turnover bound
  • Proposition 2: Inaction (no‑trade) band in 1D
  • Theorem 3: Approximation advantage of regime conditioning
  • Theorem 4: After‑cost robustness
  • Proposition 3: CVaR surrogate and alternating minimization
  • Lemma 2: Continuity and boundedness of $r^{\mathrm{net}}$
  • ...and 16 more