Table of Contents
Fetching ...

High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration

Mohaned Chraiti, Ozgur Ercetin, Merve Saimler

TL;DR

This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models.

Abstract

To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.

High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration

TL;DR

This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models.

Abstract

To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.
Paper Structure (20 sections, 1 theorem, 25 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 1 theorem, 25 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Under eq:arr_mgf and eq:neg_service_mgf, if $\Delta_{u,d}>0$, then for any deadline $\tau\ge T_d$,

Figures (4)

  • Figure 1: Federated AIaaS execution pipeline and assurance loop. The CSP AIaaS receives intents, selects model/placement, and provisions joint network--compute resources across a multi-stage path. Each administrative domain, including the Hyperscale Cloud Provider (HCP), exposes a TRE contract (signed, composable) rather than raw internal state. Telemetry feeds an audit layer for p99/p99.9 calibration.
  • Figure 2: Estimated p99.9 end-to-end delay versus normalized offered load $\rho$.
  • Figure 3: Isolation under burstiness.
  • Figure 4: Marginal tail-risk attribution under controlled degradation.

Theorems & Definitions (4)

  • Remark 1
  • Definition 1: Tail-Risk Envelope
  • Theorem 1: Single-domain delay-violation bound
  • proof