Table of Contents
Fetching ...

MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution

TL;DR

Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility, which underscores the potential of MIRACL for robust, efficient adaptation in multi-objective problems.

Abstract

Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the supply chain domain, MIRACL is theoretically domain-agnostic and applicable to broader dynamic multi-objective decision-making problems. Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility. These results underscore the potential of MIRACL for robust, efficient adaptation in multi-objective problems.

MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

TL;DR

Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility, which underscores the potential of MIRACL for robust, efficient adaptation in multi-objective problems.

Abstract

Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the supply chain domain, MIRACL is theoretically domain-agnostic and applicable to broader dynamic multi-objective decision-making problems. Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility. These results underscore the potential of MIRACL for robust, efficient adaptation in multi-objective problems.
Paper Structure (20 sections, 7 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 20 sections, 7 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: Meta-learning and adaptation phases in MIRACL. Each task is decomposed using several weight vectors, then PSA is applied between meta-updates to improve the task diversity.
  • Figure 2: Fine-tuning phase requires only a few shots training since it utilises the meta-policy as a good initialisation.
  • Figure 3: Normalised hypervolume comparison of MORL/D, MORL/D with SB and PSA, NSGA-II, Meta-MORL, and our proposed methods. Left panels show early time steps (proportional to NSGA-II generations), while right panels show later iterations. MIRACL consistently outperforms all baseline methods in simple (\ref{['fig:hv_simple']}) and moderate problems (\ref{['fig:hv_moderate']}), but is exceeded by MORL/D in complex ones (\ref{['fig:hv_complex']}).
  • Figure 4: PF approximation sets of NSGA-II, MORL/D, Meta-MORL, and MIRACL across simple, moderate, and complex SC problems. Meta-learning-based methods produce more diverse solutions than NSGA-II, yet, more concentrated than MORL/D. The concentration increases with the problem complexity.