Agent-Arena: A General Framework for Evaluating Control Algorithms

Halid Abdulrahim Kadi; Kasim Terzić

Agent-Arena: A General Framework for Evaluating Control Algorithms

Halid Abdulrahim Kadi, Kasim Terzić

TL;DR

Agent-Arena presents a general Python framework for evaluating and generalizing control algorithms across diverse robotic domains, enabling closed-loop interaction in $POMDP$/$MDP$ settings. It introduces a decoupled architecture consisting of Agent, Arena, Task, and ActionTool, configured via domain strings and YAML, to unify classical and data-driven controllers across simulation and real robots. Key contributions include a multi-agent, multi-arena workflow with a uniform trajectory data layout, visualization/logging utilities, and ROS integration to facilitate real-world transfer. The framework supports multiple benchmarks (e.g., DMControl Suite, SoftGym, Raven) and employs Ray for parallelism and a Zarr-based dataset to accelerate development and ensure reproducible experiments, ultimately enabling rapid prototyping and robust cross-domain evaluation of controllers.

Abstract

Robotic research is inherently challenging, requiring expertise in diverse environments and control algorithms. Adapting algorithms to new environments often poses significant difficulties, compounded by the need for extensive hyper-parameter tuning in data-driven methods. To address these challenges, we present Agent-Arena, a Python framework designed to streamline the integration, replication, development, and testing of decision-making policies across a wide range of benchmark environments. Unlike existing frameworks, Agent-Arena is uniquely generalised to support all types of control algorithms and is adaptable to both simulation and real-robot scenarios. Please see our GitHub repository https://github.com/halid1020/agent-arena-v0.

Agent-Arena: A General Framework for Evaluating Control Algorithms

TL;DR

Agent-Arena presents a general Python framework for evaluating and generalizing control algorithms across diverse robotic domains, enabling closed-loop interaction in

settings. It introduces a decoupled architecture consisting of Agent, Arena, Task, and ActionTool, configured via domain strings and YAML, to unify classical and data-driven controllers across simulation and real robots. Key contributions include a multi-agent, multi-arena workflow with a uniform trajectory data layout, visualization/logging utilities, and ROS integration to facilitate real-world transfer. The framework supports multiple benchmarks (e.g., DMControl Suite, SoftGym, Raven) and employs Ray for parallelism and a Zarr-based dataset to accelerate development and ensure reproducible experiments, ultimately enabling rapid prototyping and robust cross-domain evaluation of controllers.

Agent-Arena: A General Framework for Evaluating Control Algorithms

TL;DR

Abstract

Agent-Arena: A General Framework for Evaluating Control Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)