Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices

Adarsh Prasad Behera; Roberto Morabito; Joerg Widmer; Jaya Prakash Champati

Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices

Adarsh Prasad Behera, Roberto Morabito, Joerg Widmer, Jaya Prakash Champati

TL;DR

The paper tackles optimizing decision-module selection in Hierarchical Inference (HI) for resource-constrained edge devices to balance local tinyML processing and offloading to edge servers or cloud. It introduces three strategies: (i) tinyML calibration with a fixed threshold via Temperature Scaling, (ii) post-tinyML edge classifiers (LR/SVM/RF), and (iii) pre-tinyML classifiers to pre-filter samples for offloading, evaluated on CIFAR-10 with a 311 KB ResNetv1 S-ML on-device and ViT-H/14 L-ML on the cloud. Across experiments, Logistic Regression after TinyML (LRA) generally achieves the lowest cost per image ($CPI$) and strong $F1$ scores, while full offload becomes optimal only when $\alpha$ nearly equals $\beta$; pre-tinyML classifiers show regime-dependent benefits. The results provide practical guidance for deploying HI on MCUs, enabling energy-efficient, low-latency inference in edge ecosystems by selecting robust decision strategies.

Abstract

The Hierarchical Inference (HI) paradigm employs a tiered processing: the inference from simple data samples are accepted at the end device, while complex data samples are offloaded to the central servers. HI has recently emerged as an effective method for balancing inference accuracy, data processing, transmission throughput, and offloading cost. This approach proves particularly efficient in scenarios involving resource-constrained edge devices, such as IoT sensors and micro controller units (MCUs), tasked with executing tinyML inference. Notably, it outperforms strategies such as local inference execution, inference offloading to edge servers or cloud facilities, and split inference (i.e., inference execution distributed between two endpoints). Building upon the HI paradigm, this work explores different techniques aimed at further optimizing inference task execution. We propose and discuss three distinct HI approaches and evaluate their utility for image classification.

Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices

TL;DR

) and strong

scores, while full offload becomes optimal only when

nearly equals

; pre-tinyML classifiers show regime-dependent benefits. The results provide practical guidance for deploying HI on MCUs, enabling energy-efficient, low-latency inference in edge ecosystems by selecting robust decision strategies.

Abstract

Paper Structure (7 sections, 1 equation, 8 figures)

This paper contains 7 sections, 1 equation, 8 figures.

Introduction
Proposed Methodologies and Implementation
Calibration of TinyML with Fixed threshold
Use of Classifiers after TinyML inference
Use of Classifiers before TinyML inference
Results
Conclusion and Future works

Figures (8)

Figure 1: Various approaches for DL inference at the edge.
Figure 2: HI framework for DL inference at network edge.
Figure 3: Mean Accuracy vs Confidence in each bin with respective ECE before and after calibration.
Figure 4: Optimal threshold ($\theta^*$) selection.
Figure 5: HI framework with classifiers after the tinyML model on the edge device.
...and 3 more figures

Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices

TL;DR

Abstract

Improved Decision Module Selection for Hierarchical Inference in Resource-Constrained Edge Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (8)