Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

Changhao Li; Yuchen Zhuang; Rushi Qiang; Haotian Sun; Hanjun Dai; Chao Zhang; Bo Dai

Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

Changhao Li, Yuchen Zhuang, Rushi Qiang, Haotian Sun, Hanjun Dai, Chao Zhang, Bo Dai

TL;DR

Matryoshka Pilot (M-Pilot) introduces a lightweight white-box LLM controller that drives a black-box LLM by generating intermediate guidance and treating the black-box generator as an interactive environment for multi-turn problem solving. It formalizes the setup as an MDp and uses Iterative Direct Preference Optimization (IDPO) with a Bradley–Terry preference model and KL-regularized planning to continually improve guidance without accessing the black-box parameters, achieving self-improvement through feedback. Across personalization (LaMP), reasoning (GSM8K), and planning (ALFWorld), M-Pilot delivers consistent improvements over strong baselines and supports plug‑and‑play deployment with different black-box models, demonstrating data efficiency and transferability. The work highlights the potential of a transparent, scalable controller–environment framework to enhance long-horizon tasks in black-box LLMs while acknowledging societal and safety considerations.

Abstract

Despite the impressive generative abilities of black-box large language models (LLMs), their inherent opacity hinders further advancements in capabilities such as reasoning, planning, and personalization. Existing works aim to enhance LLM capabilities via domain-specific adaptation, which require additional training on accessible model parameters, an infeasible option for black-box LLMs. To address this challenge, we introduce Matryoshka Pilot (M-Pilot), a lightweight white-box LLM controller that guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs. Specifically, we consider the black-box LLM as an environment, with M-Pilot serving as a policy to provide intermediate guidance through prompts for driving the black-box LLM. M-Pilot is trained to pivot the outputs of the black-box LLM aligning with preferences during iterative interaction, which enables controllable multi-turn generation and self-improvement in optimizing intermediate guidance. Empirical evaluations on diverse tasks demonstrate that our method effectively enhances the capabilities of black-box LLMs in complex, long-horizon tasks. Our code is publicly available at: https://github.com/lichangh20/Matryoshka.

Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

TL;DR

Abstract

Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)