EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation

Beichen Huang; Ran Cheng; Zhuozhao Li; Yaochu Jin; Kay Chen Tan

EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation

Beichen Huang, Ran Cheng, Zhuozhao Li, Yaochu Jin, Kay Chen Tan

TL;DR

EvoX targets the scalability gap in Evolutionary Computation by introducing a distributed GPU-accelerated framework with a unified programming and computation model. It provides a hierarchical state management system, a workflow analyzer/executor, and an execution engine capable of running EC algorithms over heterogeneous resources via Ray or distributed JAX. The authors built a library of 50+ EC algorithms spanning single- and multi-objective optimization and integrated diverse benchmarks including numerical functions and reinforcement learning tasks (Gym, Brax). Empirical results show robust performance and scalability across multi-device and multi-node settings, with EvoX often outperforming EvoTorch in speed and memory usage, and with open-source availability on PyPI for broad adoption.

Abstract

Inspired by natural evolutionary processes, Evolutionary Computation (EC) has established itself as a cornerstone of Artificial Intelligence. Recently, with the surge in data-intensive applications and large-scale complex systems, the demand for scalable EC solutions has grown significantly. However, most existing EC infrastructures fall short of catering to the heightened demands of large-scale problem solving. While the advent of some pioneering GPU-accelerated EC libraries is a step forward, they also grapple with some limitations, particularly in terms of flexibility and architectural robustness. In response, we introduce EvoX: a computing framework tailored for automated, distributed, and heterogeneous execution of EC algorithms. At the core of EvoX lies a unique programming model to streamline the development of parallelizable EC algorithms, complemented by a computation model specifically optimized for distributed GPU acceleration. Building upon this foundation, we have crafted an extensive library comprising a wide spectrum of 50+ EC algorithms for both single- and multi-objective optimization. Furthermore, the library offers comprehensive support for a diverse set of benchmark problems, ranging from dozens of numerical test functions to hundreds of reinforcement learning tasks. Through extensive experiments across a range of problem scenarios and hardware configurations, EvoX demonstrates robust system and model performances. EvoX is open-source and accessible at: https://github.com/EMI-Group/EvoX.

EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation

TL;DR

Abstract

Paper Structure (24 sections, 1 equation, 12 figures, 5 tables)

This paper contains 24 sections, 1 equation, 12 figures, 5 tables.

Introduction
Related Work
EC libraries in Python
JAX
Motivation and Requirements
Programming and Computation Models
Programming Model
Computation Model
Architecture
Hierarchical State Management
Workflow Analyzer
Workflow Executor
Execution Engine
Implementation
Experimental Study
...and 9 more sections

Figures (12)

Figure 1: The typical process of an EC algorithm. Starting with an initial population, the EC algorithm engages in problem-solving through an iterative evolutionary process. Specifically, the main loop of this process evolves the population via three primary components: reproduction, evaluation, and selection. Ultimately, the final population is output as the solution set to the problem at hand.
Figure 2: Illustration of the computation model adopted by EvoX. There are three main modules: Algorithm, Problem, and Monitor. The iteration starts with ask function which generates $\textbf{T}_{pop}$ as the tensorized population. Then the population is sent to the Problem module for fitness evaluations via evaluate function and then generate $\textbf{T}_{pop}$ as the tensorized fitness. Finally, the fitness is passed back to the Algorithm module through the tell function. Meanwhile, $\textbf{T}_{pop}$ and $\textbf{T}_{fit}$ can be optionally sent to the Monitor module for further processing, where record_pop and record_fit functions record $\textbf{T}_{pop}$ and $\textbf{T}_{fit}$ respectively. In addition to the normal data flow, each module can update its individual state at every function call.
Figure 3: Architecture of EvoX. The workflow analyzer sets the stage for task execution. Each node is equipped with a local workflow executor, responsible for orchestrating the ask-evaluate-tell loop. At the controller node, a global workflow executor directs the local workflow executors within this loop, employing the all-gather collective operation to harmonize fitness values obtained from the evaluate phase.
Figure 4: Hierarchical state management in EvoX: (a) state initialization, (b) state update.
Figure 5: Workflow analyzer in EvoX. The analyzer examines both computation and data components distinctly. For computation, it ascertains the optimal runtime: typically, algorithms execute on devices, monitors operate on the host, and the problem can run on either. For data, it strategically distributes data across multiple devices. The universal state is sharded to ensure distribution across all devices, while the static data is replicated consistently to each device.
...and 7 more figures

EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation

TL;DR

Abstract

EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)