UML-CoT: Structured Reasoning and Planning with Unified Modeling Language for Robotic Room Cleaning
Hongyu Chen, Guangrun Wang
TL;DR
This work introduces UML-CoT, a structured chain-of-thought framework that uses UML class diagrams for symbolic reasoning and UML activity diagrams for executable plans in robotic room cleaning. By replacing unstructured text with expressive UML representations, the approach addresses interpretability, verification, and planning reliability, unifying reasoning and action under a formalism that supports inheritance, aggregation, and procedural control. A three-stage training pipeline (SFT, RLFT with GRPO, and answer-only GRPO) together with the MRoom-30k dataset demonstrates improved plan coherence, structural fidelity, and execution success over text-based and graph-based baselines. The results highlight the practical impact of structured symbolic representations for embodied AI, enabling more transparent and robust coordination between perception, reasoning, and manipulation in cluttered indoor environments.
Abstract
Chain-of-Thought (CoT) prompting improves reasoning in large language models (LLMs), but its reliance on unstructured text limits interpretability and executability in embodied tasks. Prior work has explored structured CoTs using scene or logic graphs, yet these remain fundamentally limited: they model only low-order relations, lack constructs like inheritance or behavioral abstraction, and provide no standardized semantics for sequential or conditional planning. We propose UML-CoT, a structured reasoning and planning framework that leverages Unified Modeling Language (UML) to generate symbolic CoTs and executable action plans. UML class diagrams capture compositional object semantics, while activity diagrams model procedural control flow. Our three-stage training pipeline combines supervised fine-tuning with Group Relative Policy Optimization (GRPO), including reward learning from answer-only data. We evaluate UML-CoT on MRoom-30k, a new benchmark of cluttered room-cleaning scenarios. UML-CoT outperforms unstructured CoTs in interpretability, planning coherence, and execution success, highlighting UML as a more expressive and actionable structured reasoning formalism.
