Table of Contents
Fetching ...

Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control

Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman

Abstract

Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% MAPE for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.

Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control

Abstract

Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% MAPE for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.
Paper Structure (40 sections, 21 equations, 9 figures, 10 tables)

This paper contains 40 sections, 21 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: System architecture overview. Market features $\mathbf{x}_t$ enter the autoencoder, which produces reconstruction error $e_t$ (shown on arrow). The router directs data to normal or event node transformer pathways based on whether $e_t$ exceeds the learned threshold $\tau$. Each pathway produces a prediction ($y^{N}_{t+h}$, $y^{E}_{t+h}$), and adaptive blending combines them into the final forecast $\hat{y}_{t+h}$. The SAC controller observes evaluation metrics and adjusts both $\tau$ and $\alpha$ (dashed blue arrows) to optimize forecasting accuracy.
  • Figure 2: Feature engineering pipeline. Raw OHLCV data is processed through technical indicator computations (SMA, EMA, RSI, MACD, volatility). All features undergo expanding-window z-score normalization to prevent look-ahead bias, producing prediction features and router-specific features.
  • Figure 3: Autoencoder architecture for regime detection. The encoder compresses the input feature vector through two hidden layers (64, 32 units) to a latent representation $\mathbf{z}_t$ of dimension $d_z = 32$. The decoder reconstructs the input through symmetric layers. Reconstruction error $e_t$ serves as the anomaly score for regime classification.
  • Figure 4: Dual node transformer architecture. The router directs data based on reconstruction error. The normal pathway (left, orange) processes typical market conditions with base features. The event pathway (right, blue) augments inputs with event context features $\mathbf{c}_t$. Both pathways follow the same architectural design (layer count, attention heads, model dimension) but maintain independently trained weights and differ in input dimensionality, as the event pathway accepts additional context features. Outputs are blended with adaptive weight $\alpha$.
  • Figure 5: Graph representation of stock relationships (representative subset of 11 stocks shown for clarity; full graph contains all 20 stocks). Nodes represent individual stocks, colored by sector. Solid edges indicate same-sector connections with higher learned weights (annotated values show correlation-based initialization from training data). Dashed edges represent weaker cross-sector correlations that are learned during training.
  • ...and 4 more figures