ReinFog: A Deep Reinforcement Learning Empowered Framework for Resource Management in Edge and Cloud Computing Environments

Zhiyu Wang; Mohammad Goudarzi; Rajkumar Buyya

ReinFog: A Deep Reinforcement Learning Empowered Framework for Resource Management in Edge and Cloud Computing Environments

Zhiyu Wang, Mohammad Goudarzi, Rajkumar Buyya

TL;DR

ReinFog tackles the challenge of resource management for IoT applications across edge/fog and cloud by introducing a modular, DRL-powered framework that supports both centralized and distributed DRL. It enables native and library-based DRL integrations and introduces MADCP to optimize DRL component placement across heterogeneous nodes. Empirical results show substantial improvements in response time, energy use, and cost, with scalable startup and memory overhead and reduced CO$_2$ emissions relative to a FogBus2 baseline. The work advances practical, extensible DRL-based IoT scheduling in multi-layer computing environments, offering a platform for rapid experimentation and deployment of diverse DRL techniques. Future work points to security hardening, fresh DRL methods, and resilience against failures to further strengthen ReinFog’s applicability in real-world deployments.

Abstract

The growing IoT landscape requires effective server deployment strategies to meet demands including real-time processing and energy efficiency. This is complicated by heterogeneous, dynamic applications and servers. To address these challenges, we propose ReinFog, a modular distributed software empowered with Deep Reinforcement Learning (DRL) for adaptive resource management across edge/fog and cloud environments. ReinFog enables the practical development/deployment of various centralized and distributed DRL techniques for resource management in edge/fog and cloud computing environments. It also supports integrating native and library-based DRL techniques for diverse IoT application scheduling objectives. Additionally, ReinFog allows for customizing deployment configurations for different DRL techniques, including the number and placement of DRL Learners and DRL Workers in large-scale distributed systems. Besides, we propose a novel Memetic Algorithm for DRL Component (e.g., DRL Learners and DRL Workers) Placement in ReinFog named MADCP, which combines the strengths of Genetic Algorithm, Firefly Algorithm, and Particle Swarm Optimization. Experiments reveal that the DRL mechanisms developed within ReinFog have significantly enhanced both centralized and distributed DRL techniques implementation. These advancements have resulted in notable improvements in IoT application performance, reducing response time by 45%, energy consumption by 39%, and weighted cost by 37%, while maintaining minimal scheduling overhead. Additionally, ReinFog exhibits remarkable scalability, with a rise in DRL Workers from 1 to 30 causing only a 0.3-second increase in startup time and around 2 MB more RAM per Worker. The proposed MADCP for DRL component placement further accelerates the convergence rate of DRL techniques by up to 38%.

ReinFog: A Deep Reinforcement Learning Empowered Framework for Resource Management in Edge and Cloud Computing Environments

TL;DR

Abstract

ReinFog: A Deep Reinforcement Learning Empowered Framework for Resource Management in Edge and Cloud Computing Environments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)