Table of Contents
Fetching ...

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Toshihide Ubukata, Zhiyao Wang, Enhong Mu, Jialong Li, Kenji Tei

TL;DR

A Soft Mixture-of-Experts framework is proposed that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations and substantially expands the solvable parameter space and improves robustness compared to any single expert.

Abstract

On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

TL;DR

A Soft Mixture-of-Experts framework is proposed that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations and substantially expands the solvable parameter space and improves robustness compared to any single expert.

Abstract

On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.
Paper Structure (21 sections, 5 equations, 7 figures, 2 algorithms)

This paper contains 21 sections, 5 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 1: Zero-Shot Generalization of Single RL Experts in Success Rates.
  • Figure 2: Zero-Shot Generalization of Single RL Experts in Exploration Efficiency (Number of Steps).
  • Figure 3: Dominant Expert Selection Map.
  • Figure 4: Overall Performance Comparison.
  • Figure 5: Success Heatmaps of MoE Mixtures (T1--T6).
  • ...and 2 more figures