Table of Contents
Fetching ...

Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Yuyang Zou, Youwei Xiao, Yansong Xu, Chenyun Yin, Yuhao Luo, Yitian Sun, Ruifan Xu, Renze Chen, Yun Liang

TL;DR

Aquas presents a holistic MLIR-based framework for ASIP hardware-software co-design, addressing bottlenecks in hardware synthesis and compiler retargetability. It combines a burst DMA-enabled memory subsystem and advanced HLS-driven synthesis with a novel e-graph–based retargetable compiler and skeleton-components pattern matching to map workloads onto ISAXs. Through point-cloud analysis and CPU LLM inference case studies, Aquas demonstrates significant kernel and end-to-end speedups while maintaining favorable area overhead, illustrating strong domain-specific acceleration potential. The integrated MLIR-based toolchain enables tighter hardware-software co-optimization and agile iteration across design spaces.

Abstract

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. However, existing frameworks from the open-source RISC-V ecosystem suffer from limited performance due to restricted hardware synthesis and rigid compiler support. To address these challenges, we introduce Aquas, a holistic hardware-software co-design framework built upon MLIR. Aquas enhances ASIP synthesis with fast memory access capability via a burst DMA engine and advanced high-level synthesis (HLS) optimizations. On the compiler side, we propose an e-graph based retargetable approach with a novel matching engine for efficient instruction matching. Evaluation demonstrates up to 9.27x speedup on real-world workloads, including point cloud processing and LLM inference.

Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

TL;DR

Aquas presents a holistic MLIR-based framework for ASIP hardware-software co-design, addressing bottlenecks in hardware synthesis and compiler retargetability. It combines a burst DMA-enabled memory subsystem and advanced HLS-driven synthesis with a novel e-graph–based retargetable compiler and skeleton-components pattern matching to map workloads onto ISAXs. Through point-cloud analysis and CPU LLM inference case studies, Aquas demonstrates significant kernel and end-to-end speedups while maintaining favorable area overhead, illustrating strong domain-specific acceleration potential. The integrated MLIR-based toolchain enables tighter hardware-software co-optimization and agile iteration across design spaces.

Abstract

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. However, existing frameworks from the open-source RISC-V ecosystem suffer from limited performance due to restricted hardware synthesis and rigid compiler support. To address these challenges, we introduce Aquas, a holistic hardware-software co-design framework built upon MLIR. Aquas enhances ASIP synthesis with fast memory access capability via a burst DMA engine and advanced high-level synthesis (HLS) optimizations. On the compiler side, we propose an e-graph based retargetable approach with a novel matching engine for efficient instruction matching. Evaluation demonstrates up to 9.27x speedup on real-world workloads, including point cloud processing and LLM inference.

Paper Structure

This paper contains 20 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the unified toolchain in Aquas.
  • Figure 2: Synthesis flow of gemv using Aquas. It consists of (a) CADL input, (b) MLIR parsed from CADL including aquas dialect, and (c) synthesized hardware including DMA engine, scratchpad memory, and main execution pipeline.
  • Figure 3: End-to-end workflow of the Aquas retargetable compiler. ❶ to ❽ correspond to the steps for compiling an application.