Table of Contents
Fetching ...

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang, Seohyeon Cha, Hasan Burhan Beytur, Kevin S Chan, Gustavo de Veciana, Haris Vikalo

TL;DR

A variance-reduced EXP4-based algorithm integrated with Lyapunov optimization is developed, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback, andExperiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.

Abstract

Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

TL;DR

A variance-reduced EXP4-based algorithm integrated with Lyapunov optimization is developed, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback, andExperiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.

Abstract

Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.
Paper Structure (27 sections, 7 theorems, 91 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 27 sections, 7 theorems, 91 equations, 2 figures, 4 tables, 2 algorithms.

Key Result

Proposition 4.1

For each node $n\in\mathcal{N}\setminus\mathcal{N}_1$, If the virtual queue $Q_{n}(t)$ is mean-rate stable, i.e., $\lim_{T \to \infty} \frac{\mathbb{E}[Q_{n}(T)]}{T} = 0 ,$ then the resource constraint in Eq. eq:constraint_offload is satisfied.

Figures (2)

  • Figure 1: Hierarchical inference with multi-destination offloading. Routing decisions couple expected inference loss with upstream resource consumption. Prediction error is observed only at the terminal layer, resulting in policy-dependent feedback.
  • Figure 2: 3-layer hierarchy (4-2-1). Entropy of expert weights, averaged across all nodes.

Theorems & Definitions (15)

  • Remark 3.2: Partial Feedback
  • Proposition 4.1
  • Lemma 4.2: Reduced Variance
  • Theorem 4.3: Regret Bound
  • Corollary 4.4: Near-Optimality of VR-Ly-EXP4
  • proof
  • proof
  • Lemma C.1: Unbiasedness of the Estimator
  • proof : Proof of Lemma \ref{['lemma:unbiasedness']}
  • Lemma C.2: Bounded Moments for Multi-Tier Queues
  • ...and 5 more