mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies
Remo Steiner, Alexander Millane, David Tingdahl, Clemens Volk, Vikram Ramasamy, Xinjie Yao, Peter Du, Soha Pouya, Shiwei Sheng
TL;DR
Mindmap tackles the lack of spatial memory in end-to-end robotic manipulation by fusing a diffusion-based trajectory policy with a metric-semantic 3D reconstruction of the scene. By conditioning trajectory diffusion on both current RGB-D observations and a progressively built reconstruction, the approach enables actions that depend on objects and geometry outside the current field of view. The authors introduce architectural and data-processing changes, leverage a non-differentiable yet real-time reconstruction pipeline, and demonstrate significant improvements on four memory-dependent tasks, while releasing reconstruction tools and training code. This work highlights the importance of spatial memory for robust manipulation in non-tabular settings and points toward scalable memory-augmented policies for real-world robotics.
Abstract
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
