Table of Contents
Fetching ...

Inference Offloading for Cost-Sensitive Binary Classification at the Edge

Vishnu Narayanan Moothedath, Umang Agarwal, Umeshraja N, James Richard Gross, Jaya Prakash Champati, Sharayu Moharir

TL;DR

This work tackles cost-sensitive binary classification at the edge by coupling a lightweight local model with a costly but accurate remote model in a Hierarchical Inference (HI) framework. It derives a Bayes-optimal two-threshold rule for calibrated LDLs and introduces H2T2, an online, two-threshold HI policy that learns thresholds during inference with limited feedback, achieving sublinear regret $R_T$ in $T$ with $R_T \le (\\epsilon\\beta + \\tfrac{\\eta}{2\\epsilon})T + \\tfrac{\\ln(|\\Theta|)}{\\eta}$ and optimal choices yielding $O(T^{2/3})$. The approach is model-agnostic and does not require LDL retraining, while systematically balancing local decisions against offloading costs under asymmetric FP/FN costs. Empirical results on multiple real-world and synthetic datasets show substantial cost reductions, strong performance gains over single-threshold HI, and robustness to distribution shifts and out-of-distribution data, including notable gains on medical imaging and cybersecurity benchmarks. The framework also extends to multiclass settings, where calibrated models yield region-based decision rules, and uncalibrated cases can be handled with a learned boundary-expert portfolio, enabling practical, inference-time cost-efficiency in edge deployments.

Abstract

We focus on a binary classification problem in an edge intelligence system where false negatives are more costly than false positives. The system has a compact, locally deployed model, which is supplemented by a larger, remote model, which is accessible via the network by incurring an offloading cost. For each sample, our system first uses the locally deployed model for inference. Based on the output of the local model, the sample may be offloaded to the remote model. This work aims to understand the fundamental trade-off between classification accuracy and the offloading costs within such a hierarchical inference (HI) system. To optimise this system, we propose an online learning framework that continuously adapts a pair of thresholds on the local model's confidence scores. These thresholds determine the prediction of the local model and whether a sample is classified locally or offloaded to the remote model. We present a closed-form solution for the setting where the local model is calibrated. For the more general case of uncalibrated models, we introduce H2T2, an online two-threshold hierarchical inference policy, and prove it achieves sublinear regret. H2T2 is model-agnostic, requires no training, and learns during the inference phase using limited feedback. Simulations on real-world datasets show that H2T2 consistently outperforms naive and single-threshold HI policies, sometimes even surpassing offline optima. The policy also demonstrates robustness to distribution shifts and adapts effectively to mismatched classifiers.

Inference Offloading for Cost-Sensitive Binary Classification at the Edge

TL;DR

This work tackles cost-sensitive binary classification at the edge by coupling a lightweight local model with a costly but accurate remote model in a Hierarchical Inference (HI) framework. It derives a Bayes-optimal two-threshold rule for calibrated LDLs and introduces H2T2, an online, two-threshold HI policy that learns thresholds during inference with limited feedback, achieving sublinear regret in with and optimal choices yielding . The approach is model-agnostic and does not require LDL retraining, while systematically balancing local decisions against offloading costs under asymmetric FP/FN costs. Empirical results on multiple real-world and synthetic datasets show substantial cost reductions, strong performance gains over single-threshold HI, and robustness to distribution shifts and out-of-distribution data, including notable gains on medical imaging and cybersecurity benchmarks. The framework also extends to multiclass settings, where calibrated models yield region-based decision rules, and uncalibrated cases can be handled with a learned boundary-expert portfolio, enabling practical, inference-time cost-efficiency in edge deployments.

Abstract

We focus on a binary classification problem in an edge intelligence system where false negatives are more costly than false positives. The system has a compact, locally deployed model, which is supplemented by a larger, remote model, which is accessible via the network by incurring an offloading cost. For each sample, our system first uses the locally deployed model for inference. Based on the output of the local model, the sample may be offloaded to the remote model. This work aims to understand the fundamental trade-off between classification accuracy and the offloading costs within such a hierarchical inference (HI) system. To optimise this system, we propose an online learning framework that continuously adapts a pair of thresholds on the local model's confidence scores. These thresholds determine the prediction of the local model and whether a sample is classified locally or offloaded to the remote model. We present a closed-form solution for the setting where the local model is calibrated. For the more general case of uncalibrated models, we introduce H2T2, an online two-threshold hierarchical inference policy, and prove it achieves sublinear regret. H2T2 is model-agnostic, requires no training, and learns during the inference phase using limited feedback. Simulations on real-world datasets show that H2T2 consistently outperforms naive and single-threshold HI policies, sometimes even surpassing offline optima. The policy also demonstrates robustness to distribution shifts and adapts effectively to mismatched classifiers.

Paper Structure

This paper contains 21 sections, 6 theorems, 35 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Let $x_t$ denote the sample input to a calibrated LDL. Then, the optimal predictor $h_l^*(x_t)$ is given by Further, with a known $\beta_t$, the optimum offloading decision for this setup is to offload the sample when with an associated expected loss of

Figures (7)

  • Figure 1: Architecture of the proposed HI Hedge with Two Thresholds (H2T2) offload decision-making policy.
  • Figure 2: FPR vs. FNR and average cost of single- and two-threshold policies on (a) BreakHis, and (b) synthetic configurations from TABLE \ref{['Table:Dataset']}. Normalised costs of false positive, false negative, and offload are 0.7, 1, and 0.3, respectively.
  • Figure 3: Illustration showing the invalid and three valid regions of experts from \ref{['eqn:LDLpredictor']} with respect to an observed $f_t$.
  • Figure 4: Average cost of H2T2 policy vs. fixed offloading cost $\beta$ for different datasets. Figures (a)-(d) are for in-distribution data, and (e) is for OOD data.
  • Figure 5: Average cost vs. learning rate $\eta$ with $\beta\!=\!0.4, \delta_1\!=\!0.7,\delta_{-1}\!=\!1, T\!=\!10000$. Similar trends were observed throughout the range of $\beta\in (0,1)$.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Definition 1: Calibrated Model
  • Theorem 1
  • Remark 1
  • Lemma 1
  • Lemma 2
  • Theorem 2
  • Corollary 1
  • Theorem 3