Table of Contents
Fetching ...

Mol-MoE: Training Preference-Guided Routers for Molecule Generation

Diego Calanzone, Pierluca D'Oro, Pierre-Luc Bacon

TL;DR

The paper addresses the challenge of de-novo molecule design under multiple objectives by proposing Mol-MoE, a mixture-of-experts architecture that enables test-time steering without retraining. It introduces a preference-guided router objective that encodes user-specified trade-offs in the input prompt and learns to dynamically weight expert contributions via reinforcement learning. Compared to MORLHF, RiC, and RS, Mol-MoE delivers superior sample quality and tighter adherence to target preferences, including robust out-of-distribution performance and scalable handling of increasing numbers of properties. This approach offers practical advantages for rapid exploration of chemical trade-offs in drug design, reducing retraining costs while enabling precise control over multi-objective outcomes.

Abstract

Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.

Mol-MoE: Training Preference-Guided Routers for Molecule Generation

TL;DR

The paper addresses the challenge of de-novo molecule design under multiple objectives by proposing Mol-MoE, a mixture-of-experts architecture that enables test-time steering without retraining. It introduces a preference-guided router objective that encodes user-specified trade-offs in the input prompt and learns to dynamically weight expert contributions via reinforcement learning. Compared to MORLHF, RiC, and RS, Mol-MoE delivers superior sample quality and tighter adherence to target preferences, including robust out-of-distribution performance and scalable handling of increasing numbers of properties. This approach offers practical advantages for rapid exploration of chemical trade-offs in drug design, reducing retraining costs while enabling precise control over multi-objective outcomes.

Abstract

Recent advances in language models have enabled framing molecule generation as sequence modeling. However, existing approaches often rely on single-objective reinforcement learning, limiting their applicability to real-world drug design, where multiple competing properties must be optimized. Traditional multi-objective reinforcement learning (MORL) methods require costly retraining for each new objective combination, making rapid exploration of trade-offs impractical. To overcome these limitations, we introduce Mol-MoE, a mixture-of-experts (MoE) architecture that enables efficient test-time steering of molecule generation without retraining. Central to our approach is a preference-based router training objective that incentivizes the router to combine experts in a way that aligns with user-specified trade-offs. This provides improved flexibility in exploring the chemical property space at test time, facilitating rapid trade-off exploration. Benchmarking against state-of-the-art methods, we show that Mol-MoE achieves superior sample quality and steerability.

Paper Structure

This paper contains 26 sections, 8 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Out of distribution average property score by fine-tuning method. Mol-MoE outperforms multi-task models (MORLHF, RiC, RS) and by a significant margin also task experts (RLHF $t_1=$JNK3, $t_2=$DRD2, $t_3=$GSK3$\beta$, $t_4=$CYP2D6, $t_5=$CYP2C19). Higher scores for experts trained on $t_4, t_5$ indicate positive transfer.
  • Figure 2: Pipeline illustration of Mol-MoE: a reference model is pre-trained on a large set of molecules; expert models are derived with RLHF-tuning on each desired molecule property; in the fine-tuned model, MoE blocks are added to combine the tuned expert layers, only the router networks are further tuned with the routing task.
  • Figure 3: Average best task scores achieved by models trained on the full dataset (left) or with held-out high-quality samples (right), by varying tuning method. Mol-MoE outperforms the baselines particularly out of distribution, where RiC fails to improve beyond on the training examples. Additionally, MORLHF fails to learn more than three tasks.
  • Figure 4: Steerability error measured as Mean Absolute Error (MAE) between the conditioning the measured molecule properties. Mol-MoE is overall more precise than RiC and RS, particularly on GSK3$\beta$, CYP2D6, CYP2C19.
  • Figure 5: Average score on all mocule properties by number of training objectives by tuning method. Mol-MoE benefits the most from increasing reward signals, particularly wrt. RS. Conversely, MORLHF fails beyond three tasks, coherently to what observed in Figure \ref{['fig:maximization_evaluation']}.
  • ...and 5 more figures