Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Samaresh Kumar Singh; Joyjit Roy

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Samaresh Kumar Singh, Joyjit Roy

TL;DR

Explainability-as-a-Service (XaaS) reimagines XAI as a distributed system service for edge IoT, decoupling explanation generation from inference to address redundancy, resource mismatch, and reuse challenges. The architecture introduces semantic caching, lightweight verification, and adaptive explanation delivery across edge, fog, and cloud layers, with a request-routing flow that prioritizes cache hits and validates cached content. Empirical results across manufacturing, automotive, and healthcare scenarios show a $38%$ latency reduction, a $3.2x$ throughput increase, and fidelity above $0.92$ while achieving $72%$ cache hit rates and maintaining robustness under model drift and network variability. By treating explainability as a system service, XaaS enables scalable, transparent, and regulator-ready edge AI deployments and opens avenues for federated, counterfactual, and personalized explanations in heterogeneous environments.

Abstract

Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are "coupled" in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edge-AI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38\% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

TL;DR

latency reduction, a

throughput increase, and fidelity above

while achieving

cache hit rates and maintaining robustness under model drift and network variability. By treating explainability as a system service, XaaS enables scalable, transparent, and regulator-ready edge AI deployments and opens avenues for federated, counterfactual, and personalized explanations in heterogeneous environments.

Abstract

Paper Structure (33 sections, 8 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 33 sections, 8 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Explainable AI Methods
Edge AI and Low Latency Inference
XAI For Resource-Constrained Devices
Caching and Service Architecture
Summary and Differentiation
Problem Formulation
System Model
Explanation Methods
Explanation Requests
Naive Approach and Its Limitations
XaaS Objectives
Caching Validity Conditions
Optimization Challenges
...and 18 more sections

Figures (5)

Figure 1: XaaS System Architecture. The framework decouples inference from explanation generation, enabling edge devices to request, cache, and verify explanations efficiently.
Figure 2: Comparison of XaaS with Baseline Methods. Average Values of Primary Performance Metrics Across Three Scenarios. The XaaS demonstrated a 38% reduction in latency compared to the highest performing baseline method (EdgeXAI) and an explanation fidelity of at least 0.92. Also, XaaS demonstrated a 3.2x increase in throughput, enabling it to process a significantly larger volume of requests.
Figure 3: Cache performance dynamics. (a) Hit rate stabilizes after the initial warm-up period, with manufacturing achieving the highest rates due to repetitive patterns. (b) Cache size vs. performance shows diminishing returns beyond 2000 entries. 1000 entry caches achieve 72% hit rate with 42ms latency.
Figure 4: Scalability analysis. (a) XaaS latency grows sublinearly with device count (log scale), while baselines show near-linear degradation. (b) Under increasing load, XaaS maintains $>$95% success rate up to 250 req/s; baselines degrade significantly beyond 150 req/s.
Figure 5: Ablation analysis demonstrating the impact of XaaS components. (a) Removing caching increases latency by 86%, removing verification decreases fidelity by 3.9%, and removing adaptive selection increases latency by 52%. (b) Lightweight verification achieves 95.5% detection accuracy at only 3.2% cost of full regeneration.

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

TL;DR

Abstract

Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)