Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

Egor Cherepanov; Nikita Kachaev; Alexey K. Kovalev; Aleksandr I. Panov

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov

TL;DR

This work addresses the lack of a universal memory benchmark for reinforcement learning by introducing MIKASA, a two-part benchmark suite consisting of MIKASA-Base (a unified, Gymnasium-based collection of memory tasks) and MIKASA-Robo (32 memory-intensive robotic manipulation tasks). It formalizes a four-way memory taxonomy, provides memory-focused datasets for offline RL, and evaluates online, offline, and VLA baselines to reveal current limitations in memory-enabled agents. The results show that even memory-augmented architectures struggle as memory demands increase, underscoring the need for specialized memory mechanisms in realistic robotic tasks. By offering installable tooling, standardized evaluation, and rich datasets, MIKASA aims to accelerate the development of robust memory-aware RL systems for real-world applications.

Abstract

Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base -- a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo (pip install mikasa-robo-suite) -- a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our work introduces a unified framework to advance memory RL research, enabling more robust systems for real-world use. MIKASA is available at https://tinyurl.com/membenchrobots.

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

TL;DR

Abstract

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)