Table of Contents
Fetching ...

Agent-Arena: A General Framework for Evaluating Control Algorithms

Halid Abdulrahim Kadi, Kasim Terzić

TL;DR

Agent-Arena presents a general Python framework for evaluating and generalizing control algorithms across diverse robotic domains, enabling closed-loop interaction in $POMDP$/$MDP$ settings. It introduces a decoupled architecture consisting of Agent, Arena, Task, and ActionTool, configured via domain strings and YAML, to unify classical and data-driven controllers across simulation and real robots. Key contributions include a multi-agent, multi-arena workflow with a uniform trajectory data layout, visualization/logging utilities, and ROS integration to facilitate real-world transfer. The framework supports multiple benchmarks (e.g., DMControl Suite, SoftGym, Raven) and employs Ray for parallelism and a Zarr-based dataset to accelerate development and ensure reproducible experiments, ultimately enabling rapid prototyping and robust cross-domain evaluation of controllers.

Abstract

Robotic research is inherently challenging, requiring expertise in diverse environments and control algorithms. Adapting algorithms to new environments often poses significant difficulties, compounded by the need for extensive hyper-parameter tuning in data-driven methods. To address these challenges, we present Agent-Arena, a Python framework designed to streamline the integration, replication, development, and testing of decision-making policies across a wide range of benchmark environments. Unlike existing frameworks, Agent-Arena is uniquely generalised to support all types of control algorithms and is adaptable to both simulation and real-robot scenarios. Please see our GitHub repository https://github.com/halid1020/agent-arena-v0.

Agent-Arena: A General Framework for Evaluating Control Algorithms

TL;DR

Agent-Arena presents a general Python framework for evaluating and generalizing control algorithms across diverse robotic domains, enabling closed-loop interaction in / settings. It introduces a decoupled architecture consisting of Agent, Arena, Task, and ActionTool, configured via domain strings and YAML, to unify classical and data-driven controllers across simulation and real robots. Key contributions include a multi-agent, multi-arena workflow with a uniform trajectory data layout, visualization/logging utilities, and ROS integration to facilitate real-world transfer. The framework supports multiple benchmarks (e.g., DMControl Suite, SoftGym, Raven) and employs Ray for parallelism and a Zarr-based dataset to accelerate development and ensure reproducible experiments, ultimately enabling rapid prototyping and robust cross-domain evaluation of controllers.

Abstract

Robotic research is inherently challenging, requiring expertise in diverse environments and control algorithms. Adapting algorithms to new environments often poses significant difficulties, compounded by the need for extensive hyper-parameter tuning in data-driven methods. To address these challenges, we present Agent-Arena, a Python framework designed to streamline the integration, replication, development, and testing of decision-making policies across a wide range of benchmark environments. Unlike existing frameworks, Agent-Arena is uniquely generalised to support all types of control algorithms and is adaptable to both simulation and real-robot scenarios. Please see our GitHub repository https://github.com/halid1020/agent-arena-v0.

Paper Structure

This paper contains 16 sections, 1 figure.

Figures (1)

  • Figure 1: Agent-Arena Framework. Both the agent and a list of arenas can be initialised using agent and domain strings (represented by green and blue arrows), where the former is also constructed using its corresponding configuration files for data-driven controllers. The Agent (green block) maintains a set of internal states (blue block with a dashed boundary) for each of the created arenas (blue blocks with solid boundaries). It returns an appropriate set of actions (in cyan arrows) for the arenas after receiving information (in red arrows) from them. Upon construction, each arena has class variables for the intended task (yellow block), action tools (purple block), and underlying dynamics (red block). The information provided to the agents is aggregated from these three components. This framework currently supports the integration of deep-mind control suites tunyasuvunakool2020dmcontrol, Raven zeng2021transporter, and SoftGym lin2020softgym benchmark environments. We use the former two for conducting sanity check on the baseline control algorithms used in this thesis, and the latter for the development of cloth-manipulation algorithms.