Table of Contents
Fetching ...

Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance

Arya Fayyazi, Mehdi Kamal, Massoud Pedram

TL;DR

DCOC tackles the challenge of efficiently mapping DNN workloads to heterogeneous accelerators by introducing a Dynamic Co-Optimization Compiler that combines three specialized agents in a Centralized Training with Decentralized Execution (CTDE) multi-agent reinforcement learning framework with a Confidence Sampling mechanism. The approach jointly optimizes hardware architecture and software configurations, guided by a cost model that serves as a surrogate for runtime and is updated by a central critic, while enforcing hardware/software constraints via penalties. Empirically, DCOC achieves substantial throughput improvements (up to 37.95% in the abstract and ~1.17× on average in experiments) and reduces optimization time by up to 42.2% across a range of models on a VTA++–like platform, outperforming AutoTVM and CHAMELEON. The method advances practical DNN accelerator deployment by efficiently navigating the hardware/software co-design space and accelerating compilation without sacrificing peak performance.

Abstract

This paper introduces a novel Dynamic Co-Optimization Compiler (DCOC), which employs an adaptive Multi-Agent Reinforcement Learning (MARL) framework to enhance the efficiency of mapping machine learning (ML) models, particularly Deep Neural Networks (DNNs), onto diverse hardware platforms. DCOC incorporates three specialized actor-critic agents within MARL, each dedicated to different optimization facets: one for hardware and two for software. This cooperative strategy results in an integrated hardware/software co-optimization approach, improving the precision and speed of DNN deployments. By focusing on high-confidence configurations, DCOC effectively reduces the search space, achieving remarkable performance over existing methods. Our results demonstrate that DCOC enhances throughput by up to 37.95% while reducing optimization time by up to 42.2% across various DNN models, outperforming current state-of-the-art frameworks.

Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance

TL;DR

DCOC tackles the challenge of efficiently mapping DNN workloads to heterogeneous accelerators by introducing a Dynamic Co-Optimization Compiler that combines three specialized agents in a Centralized Training with Decentralized Execution (CTDE) multi-agent reinforcement learning framework with a Confidence Sampling mechanism. The approach jointly optimizes hardware architecture and software configurations, guided by a cost model that serves as a surrogate for runtime and is updated by a central critic, while enforcing hardware/software constraints via penalties. Empirically, DCOC achieves substantial throughput improvements (up to 37.95% in the abstract and ~1.17× on average in experiments) and reduces optimization time by up to 42.2% across a range of models on a VTA++–like platform, outperforming AutoTVM and CHAMELEON. The method advances practical DNN accelerator deployment by efficiently navigating the hardware/software co-design space and accelerating compilation without sacrificing peak performance.

Abstract

This paper introduces a novel Dynamic Co-Optimization Compiler (DCOC), which employs an adaptive Multi-Agent Reinforcement Learning (MARL) framework to enhance the efficiency of mapping machine learning (ML) models, particularly Deep Neural Networks (DNNs), onto diverse hardware platforms. DCOC incorporates three specialized actor-critic agents within MARL, each dedicated to different optimization facets: one for hardware and two for software. This cooperative strategy results in an integrated hardware/software co-optimization approach, improving the precision and speed of DNN deployments. By focusing on high-confidence configurations, DCOC effectively reduces the search space, achieving remarkable performance over existing methods. Our results demonstrate that DCOC enhances throughput by up to 37.95% while reducing optimization time by up to 42.2% across various DNN models, outperforming current state-of-the-art frameworks.
Paper Structure (15 sections, 3 equations, 6 figures, 5 tables, 2 algorithms)

This paper contains 15 sections, 3 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overall search flow of DCOC.
  • Figure 2: High-level view of MARL Exploration Module. Each Agent has a policy network and, based on the centralized critic feedback, it will do an action in its own environment.
  • Figure 3: Configurations over time for ResNet-18 model a) before and b) after applying the CS method.
  • Figure 4: Comparing the achieved throughput of different frameworks over AutoTVM on VTA++.
  • Figure 5: Comparing the compilation time of different frameworks (The percentages show the speedup of DCOC compared to AutoTVM).
  • ...and 1 more figures