Table of Contents
Fetching ...

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

Yiwei Zhao, Ziyun Li, Win-San Khwa, Xiaoyu Sun, Sai Qian Zhang, Syed Shakib Sarwar, Kleber Hugo Stangherlin, Yi-Lun Lu, Jorge Tomas Gomez, Jae-Sun Seo, Phillip B. Gibbons, Barbara De Salvo, Chiao Liu

TL;DR

This work addresses the challenge of delivering accurate, low-latency AI on AR/VR edge devices by co-designing hybrid CNN/ViT models with heterogeneous hardware (NPU+CIM). It introduces H4H-NAS, a two-stage neural architecture search framework guided by a system profiler that uses real silicon and CIM IP data to optimize both model architecture and device mapping. The results show modest but meaningful accuracy gains (up to 1.34%) and substantial latency (up to 56.08% reduction) and energy improvements (up to 41.72%) when leveraging CIM parallelism, with additional gains from multi-CIM macro configurations. Overall, H4H-NAS provides a practical path to efficient, hardware-aware hybrid models for edge AR/VR workloads and offers design guidance for future NPU+CIM platforms.

Abstract

Low-Latency and Low-Power Edge AI is essential for Virtual Reality and Augmented Reality applications. Recent advances show that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can pose system challenges for latency and energy-efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage the architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and perform diverse execution schemas to efficiently execute these hybrid models. We also introduce H4H-NAS, a Neural Architecture Search framework to design efficient hybrid CNN/ViT models for heterogeneous edge systems with both NPU and CIM. Our H4H-NAS approach is powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN/ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet dataset. Moreover, results from our Algo/HW co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing such heterogeneous computing over baseline solutions. The framework guides the design of hybrid network architectures and system architectures of NPU+CIM heterogeneous systems.

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

TL;DR

This work addresses the challenge of delivering accurate, low-latency AI on AR/VR edge devices by co-designing hybrid CNN/ViT models with heterogeneous hardware (NPU+CIM). It introduces H4H-NAS, a two-stage neural architecture search framework guided by a system profiler that uses real silicon and CIM IP data to optimize both model architecture and device mapping. The results show modest but meaningful accuracy gains (up to 1.34%) and substantial latency (up to 56.08% reduction) and energy improvements (up to 41.72%) when leveraging CIM parallelism, with additional gains from multi-CIM macro configurations. Overall, H4H-NAS provides a practical path to efficient, hardware-aware hybrid models for edge AR/VR workloads and offers design guidance for future NPU+CIM platforms.

Abstract

Low-Latency and Low-Power Edge AI is essential for Virtual Reality and Augmented Reality applications. Recent advances show that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can pose system challenges for latency and energy-efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage the architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and perform diverse execution schemas to efficiently execute these hybrid models. We also introduce H4H-NAS, a Neural Architecture Search framework to design efficient hybrid CNN/ViT models for heterogeneous edge systems with both NPU and CIM. Our H4H-NAS approach is powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN/ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet dataset. Moreover, results from our Algo/HW co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing such heterogeneous computing over baseline solutions. The framework guides the design of hybrid network architectures and system architectures of NPU+CIM heterogeneous systems.

Paper Structure

This paper contains 13 sections, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Silicon for testing ARM Ethos-U55 NPU.
  • Figure 2: Throughput and energy efficiency of Ethos-U55 NPU execution of different layers, normalized by the theoretical best performance on U55. Conv and Dconv respectively stands for regular convolution and depthwise convolution with (3,3)-kernels. PConv represents pointwise convolution with (1,1)-kernel. FC refers to fully-connected layers.
  • Figure 3: Architecture layout of the MRAM CIM macro. I/OFMP stand for input/output feature maps.
  • Figure 4: The comparative ratio of throughput and energy efficiency between a system with 8 CIM macros and a U55-only system when executing fully-connected layers and depthwise convolution with (3,3)-kernel and (32,32)-input.
  • Figure 5: An example of how our search space can be flexibly reduced to basic blocks of different existing model types.
  • ...and 6 more figures