Table of Contents
Fetching ...

DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators

Taesik Gong, Fahim Kawsar, Chulhong Min

TL;DR

DEX addresses the memory-imposed input downsampling bottleneck in tiny AI accelerators by extending image information across additional data channels using patch-wise even sampling and channel-wise stacking. This approach fully leverages idle per-processor memory and parallel processors, preserving latency while improving CNN accuracy by about 3–4 percentage points across multiple models and datasets on MAX78000/78002. The work demonstrates modest increases in model size and substantial gains in information utilization, with an extensive on-device evaluation and ablations showing DEX outperforms simple downsampling and CoordConv baselines. By enabling richer on-device visual processing without latency penalties, DEX strengthens the practicality of deploying higher-quality CNNs on resource-constrained devices.

Abstract

Tiny machine learning (TinyML) aims to run ML models on small devices and is increasingly favored for its enhanced privacy, reduced latency, and low cost. Recently, the advent of tiny AI accelerators has revolutionized the TinyML field by significantly enhancing hardware processing power. These accelerators, equipped with multiple parallel processors and dedicated per-processor memory instances, offer substantial performance improvements over traditional microcontroller units (MCUs). However, their limited data memory often necessitates downsampling input images, resulting in accuracy degradation. To address this challenge, we propose Data channel EXtension (DEX), a novel approach for efficient CNN execution on tiny AI accelerators. DEX incorporates additional spatial information from original images into input images through patch-wise even sampling and channel-wise stacking, effectively extending data across input channels. By leveraging underutilized processors and data memory for channel extension, DEX facilitates parallel execution without increasing inference latency. Our evaluation with four models and four datasets on tiny AI accelerators demonstrates that this simple idea improves accuracy on average by 3.5%p while keeping the inference latency the same on the AI accelerator. The source code is available at https://github.com/Nokia-Bell-Labs/data-channel-extension.

DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators

TL;DR

DEX addresses the memory-imposed input downsampling bottleneck in tiny AI accelerators by extending image information across additional data channels using patch-wise even sampling and channel-wise stacking. This approach fully leverages idle per-processor memory and parallel processors, preserving latency while improving CNN accuracy by about 3–4 percentage points across multiple models and datasets on MAX78000/78002. The work demonstrates modest increases in model size and substantial gains in information utilization, with an extensive on-device evaluation and ablations showing DEX outperforms simple downsampling and CoordConv baselines. By enabling richer on-device visual processing without latency penalties, DEX strengthens the practicality of deploying higher-quality CNNs on resource-constrained devices.

Abstract

Tiny machine learning (TinyML) aims to run ML models on small devices and is increasingly favored for its enhanced privacy, reduced latency, and low cost. Recently, the advent of tiny AI accelerators has revolutionized the TinyML field by significantly enhancing hardware processing power. These accelerators, equipped with multiple parallel processors and dedicated per-processor memory instances, offer substantial performance improvements over traditional microcontroller units (MCUs). However, their limited data memory often necessitates downsampling input images, resulting in accuracy degradation. To address this challenge, we propose Data channel EXtension (DEX), a novel approach for efficient CNN execution on tiny AI accelerators. DEX incorporates additional spatial information from original images into input images through patch-wise even sampling and channel-wise stacking, effectively extending data across input channels. By leveraging underutilized processors and data memory for channel extension, DEX facilitates parallel execution without increasing inference latency. Our evaluation with four models and four datasets on tiny AI accelerators demonstrates that this simple idea improves accuracy on average by 3.5%p while keeping the inference latency the same on the AI accelerator. The source code is available at https://github.com/Nokia-Bell-Labs/data-channel-extension.

Paper Structure

This paper contains 65 sections, 2 equations, 13 figures, 6 tables, 1 algorithm.

Figures (13)

  • Figure 1: The architecture of a tiny AI accelerator (MAX78000 max78000).
  • Figure 2: Comparison between an AI accelerator (MAX78000) and MCUs (MAX32650 and STM32F7).
  • Figure 3: Processor utilization with varying input channels on the AI accelerator.
  • Figure 4: Comparison among different input data. (a) an original image that exceeds the data memory limit of the AI accelerator, (b) a downsampled image that fits the data memory but does not fully utilize parallel processors and data memory, and (c) a DEX-generated image that incorporates more information from original image by extending data across channels with full utilization of parallel processors and data memory instances.
  • Figure 5: Overview of DEX. DEX divides the original image $I$ into multiple patches. DEX then evenly samples pixels from each patch $P_{ij}$ and constructs an output pixel $O_{ij}$ by stacking samples across channels.
  • ...and 8 more figures