Knolling Bot: Teaching Robots the Human Notion of Tidiness
Yuhang Hu, Judah Goldfeder, Zhizhuo Zhang, Xinyue Zhu, Ruibo Liu, Philippe Wyder, Jiong Lin, Hod Lipson
TL;DR
This work addresses the challenge of endowing robots with a human-like sense of tidiness for home environments. It treats knolling as an autoregressive sequence prediction problem and uses a transformer architecture paired with a Gaussian Mixture Model to capture multiple valid object placements, enabling diverse, preference-aware arrangements. The approach is trained in a self-supervised manner on a large synthetic dataset and integrated into a complete pipeline with a perception module and robotic controller, achieving real-world tidying with varying object counts. The authors also release a dataset and benchmark to foster reproducibility and further study of object rearrangement with arbitrary numbers and shapes, advancing the goal of collaborative, aesthetically aware robotic assistants in living spaces.
Abstract
For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Human art and aesthetics are among the most elusive concepts-often difficult even for people to articulate-and without grasping these fundamentals, robots will be unable to help in many spheres of daily life. Consider the long-promised robotic butler: automating domestic chores demands more than motion planning. It requires an internal model of cleanliness and tidiness-a challenge largely unexplored by AI. To bridge this gap, we propose an approach that equips domestic robots to perform simple tidying tasks via knolling, the practice of arranging scattered items into neat, space-efficient layouts. Unlike the uniformity of industrial settings, household environments feature diverse objects and highly subjective notions of tidiness. Drawing inspiration from NLP, we treat knolling as a sequential prediction problem and employ a transformer based model to forecast each object's placement. Our method learns a generalizable concept of tidiness, generates diverse solutions adaptable to varying object sets, and incorporates human preferences for personalized arrangements. This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
