Table of Contents
Fetching ...

Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

Xu Wen, Wanling Gao, Lei Wang, Jianfeng Zhan

TL;DR

The paper tackles the bottleneck of porting domain-specific frameworks across diverse hardware by introducing a multi-layer abstraction that unifies DL, CML, and DA workloads into a DAG-based abstraction. This unified representation is lowered to a minimal primitive operator set and mapped to heterogeneous devices via TVM/LLVM backends, achieving porting complexity of $O(M+N)$ instead of $O(M\times N)$. The system supports a wide hardware spectrum (X86, ARM, RISC-V, IoT, GPU) and demonstrates substantial speedups across benchmarks, validating both portability and performance. The approach emphasizes domain-specific and unified optimizations, reuses existing compiler frameworks to reduce engineering costs, and offers a practical runtime for cross-domain workloads.

Abstract

The rapid development of domain-specific frameworks has presented us with a significant challenge: The current approach of implementing solutions on a case-by-case basis incurs a theoretical complexity of O(M*N), thereby increasing the cost of porting applications to different hardware platforms. To address these challenges, we propose a systematic methodology that effectively bridges the gap between domain-specific frameworks and multiple hardware devices, reducing porting complexity to O(M+N). The approach utilizes multi-layer abstractions. Different domain-specific abstractions are employed to represent applications from various domains. These abstractions are then transformed into a unified abstraction, which is subsequently translated into combinations of primitive operators. Finally, these operators are mapped to multiple hardware platforms. The implemented unified framework supports deep learning, classical machine learning, and data analysis across X86, ARM, RISC-V, IoT devices, and GPU. It outperforms existing solutions like scikit-learn, hummingbird, Spark, and pandas, achieving impressive speedups: 1.1x to 3.83x on X86 servers, 1.06x to 4.33x on ARM IoT devices, 1.25x to 3.72x on RISC-V IoT devices, and 1.93x on GPU. The source code is available at https://github.com/BenchCouncil/bridger.git.

Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

TL;DR

The paper tackles the bottleneck of porting domain-specific frameworks across diverse hardware by introducing a multi-layer abstraction that unifies DL, CML, and DA workloads into a DAG-based abstraction. This unified representation is lowered to a minimal primitive operator set and mapped to heterogeneous devices via TVM/LLVM backends, achieving porting complexity of instead of . The system supports a wide hardware spectrum (X86, ARM, RISC-V, IoT, GPU) and demonstrates substantial speedups across benchmarks, validating both portability and performance. The approach emphasizes domain-specific and unified optimizations, reuses existing compiler frameworks to reduce engineering costs, and offers a practical runtime for cross-domain workloads.

Abstract

The rapid development of domain-specific frameworks has presented us with a significant challenge: The current approach of implementing solutions on a case-by-case basis incurs a theoretical complexity of O(M*N), thereby increasing the cost of porting applications to different hardware platforms. To address these challenges, we propose a systematic methodology that effectively bridges the gap between domain-specific frameworks and multiple hardware devices, reducing porting complexity to O(M+N). The approach utilizes multi-layer abstractions. Different domain-specific abstractions are employed to represent applications from various domains. These abstractions are then transformed into a unified abstraction, which is subsequently translated into combinations of primitive operators. Finally, these operators are mapped to multiple hardware platforms. The implemented unified framework supports deep learning, classical machine learning, and data analysis across X86, ARM, RISC-V, IoT devices, and GPU. It outperforms existing solutions like scikit-learn, hummingbird, Spark, and pandas, achieving impressive speedups: 1.1x to 3.83x on X86 servers, 1.06x to 4.33x on ARM IoT devices, 1.25x to 3.72x on RISC-V IoT devices, and 1.93x on GPU. The source code is available at https://github.com/BenchCouncil/bridger.git.
Paper Structure (23 sections, 1 equation, 8 figures, 7 tables)

This paper contains 23 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: We propose a method to systematically bridge the gap between domain-specific frameworks and multiple hardware devices.
  • Figure 2: Design Overview.
  • Figure 3: System Architecture.
  • Figure 4: The coverage of primitive operators.
  • Figure 5: The performance on X86 server. The absent data means unsupported.
  • ...and 3 more figures