Steering LLMs via Scalable Interactive Oversight
Enyu Zhou, Zhiheng Xi, Long Ma, Zhihao Zhang, Shihan Dou, Zhikai Lei, Guoteng Wang, Rui Zheng, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang
TL;DR
The paper tackles the challenge of aligning powerful LLMs with imperfect human intent in long-horizon tasks by introducing Scalable Interactive Oversight, a recursive, tree-structured interaction framework that elicits low-burden feedback at leaf nodes and aggregates it into global guidance before execution. It validates the approach on a vibe-coding task—web development PRD generation—demonstrating up to 54% improvement in alignment over baselines and enabling online RL from human feedback to further improve performance and efficiency. The Sandwich Protocol underpins the evaluation, using a non-expert supervisor, a capable model, and an expert evaluator to bound achievable alignment and guide methodological design. The work also shows that reinforcement learning with online human feedback, optionally combined with expert rewards, generalizes to untrained modules and accelerates interactive efficiency, offering a practical pathway for maintaining human control as AI scales. Overall, the framework advances controllability in AI through structured, scalable human supervision that preemptively translates vague intent into precise, verifiable specifications.
Abstract
As Large Language Models increasingly automate complex, long-horizon tasks such as \emph{vibe coding}, a supervision gap has emerged. While models excel at execution, users often struggle to guide them effectively due to insufficient domain expertise, the difficulty of articulating precise intent, and the inability to reliably validate complex outputs. It presents a critical challenge in scalable oversight: enabling humans to responsibly steer AI systems on tasks that surpass their own ability to specify or verify. To tackle this, we propose Scalable Interactive Oversight, a framework that decomposes complex intent into a recursive tree of manageable decisions to amplify human supervision. Rather than relying on open-ended prompting, our system elicits low-burden feedback at each node and recursively aggregates these signals into precise global guidance. Validated in web development task, our framework enables non-experts to produce expert-level Product Requirement Documents, achieving a 54\% improvement in alignment. Crucially, we demonstrate that this framework can be optimized via Reinforcement Learning using only online user feedback, offering a practical pathway for maintaining human control as AI scales.
