Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search
Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok
TL;DR
This work tackles multi-attribute controllable summarization by reframing it as adaptive planning via Monte Carlo Tree Search (MCTS). PACO operates with summary-level nodes and actions that adjust single attributes, distinguishing deterministic targets from non-deterministic alignments, and uses a PUCT-based selection with a local reward/feasibility heuristic to guide search. Empirically, PACO achieves robust controllability across MACSum_Dial, MACSum_Doc, and DialogSum, with 1B models rivaling larger baselines and 70B models delivering state-of-the-art control while preserving quality; planning-inference is entirely at test time, without attribute-specific training. The approach offers practical, training-free flexibility for diverse domains, though it incurs higher compute; future work could optimize search efficiency and extend to broader quality dimensions and attribute types.
Abstract
Controllable summarization moves beyond generic outputs toward human-aligned summaries guided by specified attributes. In practice, the interdependence among attributes makes it challenging for language models to satisfy correlated constraints consistently. Moreover, previous approaches often require per-attribute fine-tuning, limiting flexibility across diverse summary attributes. In this paper, we propose adaptive planning for multi-attribute controllable summarization (PACO), a training-free framework that reframes the task as planning the order of sequential attribute control with a customized Monte Carlo Tree Search (MCTS). In PACO, nodes represent summaries, and actions correspond to single-attribute adjustments, enabling progressive refinement of only the attributes requiring further control. This strategy adaptively discovers optimal control orders, ultimately producing summaries that effectively meet all constraints. Extensive experiments across diverse domains and models demonstrate that PACO achieves robust multi-attribute controllability, surpassing both LLM-based self-planning models and fine-tuned baselines. Remarkably, PACO with Llama-3.2-1B rivals the controllability of the much larger Llama-3.3-70B baselines. With larger models, PACO achieves superior control performance, outperforming all competitors.
