Table of Contents
Fetching ...

MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework

Nguyen Viet Tuan Kiet, Dao Van Tung, Tran Cong Dao, Huynh Thi Thanh Binh

TL;DR

<3-5 sentences high-level summary> MOTIF reframes solver design as a multi-strategy optimization problem and solves it with a turn-based, two-agent framework. It uses competitive Monte Carlo Tree Search to iteratively improve interdependent solver components, first in isolation and then in a system-aware phase. Dynamic baselines, structured prompting, and three operators (counter, learning, innovation) drive both adversarial pressure and collaborative refinement, achieving superior performance across TSP, CVRP, MKP, OP, and BPP benchmarks. The approach demonstrates that co-design of multiple algorithmic components via LLM-driven self-play yields more diverse and coherent solvers than traditional single-component optimization.</paper_summary>

Abstract

Designing effective algorithmic components remains a fundamental obstacle in tackling NP-hard combinatorial optimization problems (COPs), where solvers often rely on carefully hand-crafted strategies. Despite recent advances in using large language models (LLMs) to synthesize high-quality components, most approaches restrict the search to a single element - commonly a heuristic scoring function - thus missing broader opportunities for innovation. In this paper, we introduce a broader formulation of solver design as a multi-strategy optimization problem, which seeks to jointly improve a set of interdependent components under a unified objective. To address this, we propose Multi-strategy Optimization via Turn-based Interactive Framework (MOTIF) - a novel framework based on Monte Carlo Tree Search that facilitates turn-based optimization between two LLM agents. At each turn, an agent improves one component by leveraging the history of both its own and its opponent's prior updates, promoting both competitive pressure and emergent cooperation. This structured interaction broadens the search landscape and encourages the discovery of diverse, high-performing solutions. Experiments across multiple COP domains show that MOTIF consistently outperforms state-of-the-art methods, highlighting the promise of turn-based, multi-agent prompting for fully automated solver design.

MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework

TL;DR

<3-5 sentences high-level summary> MOTIF reframes solver design as a multi-strategy optimization problem and solves it with a turn-based, two-agent framework. It uses competitive Monte Carlo Tree Search to iteratively improve interdependent solver components, first in isolation and then in a system-aware phase. Dynamic baselines, structured prompting, and three operators (counter, learning, innovation) drive both adversarial pressure and collaborative refinement, achieving superior performance across TSP, CVRP, MKP, OP, and BPP benchmarks. The approach demonstrates that co-design of multiple algorithmic components via LLM-driven self-play yields more diverse and coherent solvers than traditional single-component optimization.</paper_summary>

Abstract

Designing effective algorithmic components remains a fundamental obstacle in tackling NP-hard combinatorial optimization problems (COPs), where solvers often rely on carefully hand-crafted strategies. Despite recent advances in using large language models (LLMs) to synthesize high-quality components, most approaches restrict the search to a single element - commonly a heuristic scoring function - thus missing broader opportunities for innovation. In this paper, we introduce a broader formulation of solver design as a multi-strategy optimization problem, which seeks to jointly improve a set of interdependent components under a unified objective. To address this, we propose Multi-strategy Optimization via Turn-based Interactive Framework (MOTIF) - a novel framework based on Monte Carlo Tree Search that facilitates turn-based optimization between two LLM agents. At each turn, an agent improves one component by leveraging the history of both its own and its opponent's prior updates, promoting both competitive pressure and emergent cooperation. This structured interaction broadens the search landscape and encourages the discovery of diverse, high-performing solutions. Experiments across multiple COP domains show that MOTIF consistently outperforms state-of-the-art methods, highlighting the promise of turn-based, multi-agent prompting for fully automated solver design.

Paper Structure

This paper contains 85 sections, 34 equations, 4 figures, 8 tables, 7 algorithms.

Figures (4)

  • Figure 1: (a) Monolithic reflective pipeline: generation and reflection exchange one‑way hints around a single evaluator with minimal behavioral awareness. (b) Turn‑based interactive framework: two agents take turns generating and updating under shared evaluations, yielding explicit peer feedback, richer diversity, and adaptive explore–exploit balance.
  • Figure 2: Overview of the component-wise competition framework. Left: The outer controller selects a strategy tree $\mathcal{T}_k$ to optimize in each iteration. Right: The selected tree is improved via a two-player CMCTS, where agents alternate turns using one operator. Each move prompts the LLM with contextualized information about the current and opponent implementations, as well as prior history. Generated code is evaluated and backpropagated through the tree based on a Q-value that accounts for both absolute and relative improvements. The best solution is retained for potential system-level baseline updates.
  • Figure 3: Comparison of AHD frameworks applied to the ACO algorithm. EoH, ReEvo, and MCTS-AHD optimize only a single strategy component (the heuristic function), while MOTIF concurrently optimizes two or three strategy components. Left: Relative performance compared to the human-designed baseline. Right: Evaluation curves showing the best objective value over time (measured by the number of evaluations), averaged over three independent runs.
  • Figure 4: Convergence behavior of the MOTIF framework during training, averaged over five independent runs. Left: Best optimality gap achieved at each outer iteration, shown separately for Player 1 (P1), Player 2 (P2), and the overall best. Right: Performance breakdown by operator type.

Theorems & Definitions (5)

  • Definition 3.1: Domain, Instance, and Solution
  • Definition 3.1: Domain, Instance, and Solution
  • Definition 3.2: Solver and Strategy
  • Definition 3.3: Strategy Space
  • Definition 3.4: Multi-strategy Optimization