Visual Episodic Memory-based Exploration
Jack Vice, Natalie Ruiz-Sanchez, Pamela K. Douglas, Gita Sukthankar
TL;DR
The paper addresses exploration in robotics under sparse extrinsic rewards by introducing visual episodic memory as an intrinsic motivation signal. It proposes a twin ConvLSTM autoencoder architecture that reconstructs ten-frame video sequences, using multi-frame SSIM as the intrinsic reward to guide exploration toward poorly predicted, dynamic spatiotemporal regions. Empirical results show superior performance to CVAE-based curiosity in detecting dynamic anomalies and reconstructing real-world video, while identifying catastrophic forgetting when learning proceeds during exploration. The work advances autonomous exploration for tasks like search and rescue and security by leveraging temporal-spatial visual memory to drive curiosity-driven behavior with practical robustness considerations.
Abstract
In humans, intrinsic motivation is an important mechanism for open-ended cognitive development; in robots, it has been shown to be valuable for exploration. An important aspect of human cognitive development is $\textit{episodic memory}$ which enables both the recollection of events from the past and the projection of subjective future. This paper explores the use of visual episodic memory as a source of intrinsic motivation for robotic exploration problems. Using a convolutional recurrent neural network autoencoder, the agent learns an efficient representation for spatiotemporal features such that accurate sequence prediction can only happen once spatiotemporal features have been learned. Structural similarity between ground truth and autoencoder generated images is used as an intrinsic motivation signal to guide exploration. Our proposed episodic memory model also implicitly accounts for the agent's actions, motivating the robot to seek new interactive experiences rather than just areas that are visually dissimilar. When guiding robotic exploration, our proposed method outperforms the Curiosity-driven Variational Autoencoder (CVAE) at finding dynamic anomalies.
