Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Toshihide Ubukata; Zhiyao Wang; Enhong Mu; Jialong Li; Kenji Tei

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Toshihide Ubukata, Zhiyao Wang, Enhong Mu, Jialong Li, Kenji Tei

TL;DR

A Soft Mixture-of-Experts framework is proposed that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations and substantially expands the solvable parameter space and improves robustness compared to any single expert.

Abstract

On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

TL;DR

Abstract

Paper Structure (21 sections, 5 equations, 7 figures, 2 algorithms)

This paper contains 21 sections, 5 equations, 7 figures, 2 algorithms.

Introduction
Background
On-the-fly Directed Controller Synthesis
Discrete Event Systems and Synthesis
Exploration Optimization Problem
RL-based Exploration Policy
Mixture-of-Experts and Gating Strategies
Proposal: Soft Mixture-of-Experts
Training Phase: Prior Strength Construction
Synthesis Phase: Prior-Confidence Gating Mechanism
Evaluation
Experiment Setup
RQ1: Anisotropic Generalization
RQ2: Effectiveness
RQ3: Computation Cost
...and 6 more sections

Figures (7)

Figure 1: Zero-Shot Generalization of Single RL Experts in Success Rates.
Figure 2: Zero-Shot Generalization of Single RL Experts in Exploration Efficiency (Number of Steps).
Figure 3: Dominant Expert Selection Map.
Figure 4: Overall Performance Comparison.
Figure 5: Success Heatmaps of MoE Mixtures (T1--T6).
...and 2 more figures

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

TL;DR

Abstract

Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (7)