Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

Yiwei Zhao; Ziyun Li; Win-San Khwa; Xiaoyu Sun; Sai Qian Zhang; Syed Shakib Sarwar; Kleber Hugo Stangherlin; Yi-Lun Lu; Jorge Tomas Gomez; Jae-Sun Seo; Phillip B. Gibbons; Barbara De Salvo; Chiao Liu

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

Yiwei Zhao, Ziyun Li, Win-San Khwa, Xiaoyu Sun, Sai Qian Zhang, Syed Shakib Sarwar, Kleber Hugo Stangherlin, Yi-Lun Lu, Jorge Tomas Gomez, Jae-Sun Seo, Phillip B. Gibbons, Barbara De Salvo, Chiao Liu

TL;DR

This work addresses the challenge of delivering accurate, low-latency AI on AR/VR edge devices by co-designing hybrid CNN/ViT models with heterogeneous hardware (NPU+CIM). It introduces H4H-NAS, a two-stage neural architecture search framework guided by a system profiler that uses real silicon and CIM IP data to optimize both model architecture and device mapping. The results show modest but meaningful accuracy gains (up to 1.34%) and substantial latency (up to 56.08% reduction) and energy improvements (up to 41.72%) when leveraging CIM parallelism, with additional gains from multi-CIM macro configurations. Overall, H4H-NAS provides a practical path to efficient, hardware-aware hybrid models for edge AR/VR workloads and offers design guidance for future NPU+CIM platforms.

Abstract

Low-Latency and Low-Power Edge AI is essential for Virtual Reality and Augmented Reality applications. Recent advances show that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can pose system challenges for latency and energy-efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage the architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and perform diverse execution schemas to efficiently execute these hybrid models. We also introduce H4H-NAS, a Neural Architecture Search framework to design efficient hybrid CNN/ViT models for heterogeneous edge systems with both NPU and CIM. Our H4H-NAS approach is powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN/ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet dataset. Moreover, results from our Algo/HW co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing such heterogeneous computing over baseline solutions. The framework guides the design of hybrid network architectures and system architectures of NPU+CIM heterogeneous systems.

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

TL;DR

Abstract

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)