Table of Contents
Fetching ...

Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML Accelerators

Sarbartha Banerjee, Shijia Wei, Prakash Ramrakhyani, Mohit Tiwari

TL;DR

Obsidian is presented, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator by exploring the state space using analytical and cycle-accurate models cooperatively and finding the optimal model mapping.

Abstract

Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.

Obsidian: Cooperative State-Space Exploration for Performant Inference on Secure ML Accelerators

TL;DR

Obsidian is presented, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator by exploring the state space using analytical and cycle-accurate models cooperatively and finding the optimal model mapping.

Abstract

Trusted execution environments (TEEs) for machine learning accelerators are indispensable in secure and efficient ML inference. Optimizing workloads through state-space exploration for the accelerator architectures improves performance and energy consumption. However, such explorations are expensive and slow due to the large search space. Current research has to use fast analytical models that forego critical hardware details and cross-layer opportunities unique to the hardware security primitives. While cycle-accurate models can theoretically reach better designs, their high runtime cost restricts them to a smaller state space. We present Obsidian, an optimization framework for finding the optimal mapping from ML kernels to a secure ML accelerator. Obsidian addresses the above challenge by exploring the state space using analytical and cycle-accurate models cooperatively. The two main exploration components include: (1) A secure accelerator analytical model, that includes the effect of secure hardware while traversing the large mapping state space and produce the best m model mappings; (2) A compiler profiling step on a cycle-accurate model, that captures runtime bottlenecks to further improve execution runtime, energy and resource utilization and find the optimal model mapping. We compare our results to a baseline secure accelerator, comprising of the state-of-the-art security schemes obtained from guardnn [ 33 ] and sesame [11]. The analytical model reduces the inference latency by 20.5% for a cloud and 8.4% for an edge deployment with an energy improvement of 24% and 19% respectively. The cycle-accurate model, further reduces the latency by 9.1% for a cloud and 12.2% for an edge with an energy improvement of 13.8% and 13.1%.
Paper Structure (35 sections, 3 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 35 sections, 3 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Design space exploration is performed by ordering and tiling tensor dimensions. The optimal mapping is executed in a secure accelerator. The green blocks show the secure hardware, while datapath is shown in yellow.
  • Figure 2: The different optimization phases of Obsidian
  • Figure 3: The boxplot shows the performance of 20 model mappings with layer maps chosen from the top-k output of the tiling and loop order exploration.
  • Figure 4: Memory traffic reduction after hash granularity optimization. $\{T_D,T_R,T_V\}$ are demand, redundant and hash memory requests.
  • Figure 5: EDP of top-m model mappings generated by simulated annealing. The result is normalized against the model map with all the top layer mappings.
  • ...and 3 more figures