AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators
Mukta Debnath, Krishnendu Guha, Debasri Saha, Amlan Chakrabarti, Susmita Sur-Kolay
TL;DR
This work tackles the reliability challenge of partitioned DNN inference on heterogeneous edge accelerators by introducing AFarePart, a fault-aware partitioning framework that incorporates fault-injection feedback into a multi-objective NSGA-II optimization. The offline phase builds a Pareto front balancing latency, energy, and accuracy drop under faults, while the online phase dynamically reconfigures partitions during inference if resilience degrades. Key contributions include a fault-injection-aware partitioning method, the explicit use of accuracy degradation as an optimization objective, and demonstrated resilience gains (up to 27.7% improvement in fault tolerance) with modest latency and energy overhead across CNNs on edge hardware. This approach enables more robust AI inference in error-prone environments, paving the way for safer deployment of DNNs on resource-constrained systems. The results underscore the value of integrating resilience metrics into partitioning and highlight practical avenues for runtime adaptation on heterogeneous edge platforms.
Abstract
Deep Neural Networks (DNNs) are increasingly deployed across distributed and resource-constrained platforms, such as System-on-Chip (SoC) accelerators and edge-cloud systems. DNNs are often partitioned and executed across heterogeneous processing units to optimize latency and energy. However, the reliability of these partitioned models under hardware faults and communication errors remains a critical yet underexplored topic, especially in safety-critical applications. In this paper, we propose an accuracy-aware, fault-resilient DNN partitioning framework targeting multi-objective optimization using NSGA-II, where accuracy degradation under fault conditions is introduced as a core metric alongside energy and latency. Our framework performs runtime fault injection during optimization and utilizes a feedback loop to prioritize fault-tolerant partitioning. We evaluate our approach on benchmark CNNs including AlexNet, SqueezeNet and ResNet18 on hardware accelerators, and demonstrate up to 27.7% improvement in fault tolerance with minimal increase in performance overhead. Our results highlight the importance of incorporating resilience into DNN partitioning, and thereby paving the way for robust AI inference in error-prone environments.
