Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang; Seohyeon Cha; Hasan Burhan Beytur; Kevin S Chan; Gustavo de Veciana; Haris Vikalo

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang, Seohyeon Cha, Hasan Burhan Beytur, Kevin S Chan, Gustavo de Veciana, Haris Vikalo

TL;DR

A variance-reduced EXP4-based algorithm integrated with Lyapunov optimization is developed, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback, andExperiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.

Abstract

Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

TL;DR

Abstract

Paper Structure (27 sections, 7 theorems, 91 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 27 sections, 7 theorems, 91 equations, 2 figures, 4 tables, 2 algorithms.

Introduction
Related Work
System Model
Methodology
Lyapunov Resource Optimization
Hierarchical Routing Bandits
Variance-Reduced Loss Estimation
Theoretical Guarantees
Greedy Model Onloading
Experiments
Conclusion
Algorithm Pseudo Code
Pseudo Code of VR-Ly-EXP4
Pseudo Code of Greedy Model Placement
Additional Experimental Details and Results
...and 12 more sections

Key Result

Proposition 4.1

For each node $n\in\mathcal{N}\setminus\mathcal{N}_1$, If the virtual queue $Q_{n}(t)$ is mean-rate stable, i.e., $\lim_{T \to \infty} \frac{\mathbb{E}[Q_{n}(T)]}{T} = 0 ,$ then the resource constraint in Eq. eq:constraint_offload is satisfied.

Figures (2)

Figure 1: Hierarchical inference with multi-destination offloading. Routing decisions couple expected inference loss with upstream resource consumption. Prediction error is observed only at the terminal layer, resulting in policy-dependent feedback.
Figure 2: 3-layer hierarchy (4-2-1). Entropy of expert weights, averaged across all nodes.

Theorems & Definitions (15)

Remark 3.2: Partial Feedback
Proposition 4.1
Lemma 4.2: Reduced Variance
Theorem 4.3: Regret Bound
Corollary 4.4: Near-Optimality of VR-Ly-EXP4
proof
proof
Lemma C.1: Unbiasedness of the Estimator
proof : Proof of Lemma \ref{['lemma:unbiasedness']}
Lemma C.2: Bounded Moments for Multi-Tier Queues
...and 5 more

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

TL;DR

Abstract

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)