Table of Contents
Fetching ...

Delay Optimization in a Simple Offloading System: Extended Version

Darin Jeff, Eytan Modiano

TL;DR

This work analyzes a two-stage computation offloading system with local and cloud servers and two service modes that differ in workload splitting. By introducing a canonical transformation and analyzing a tunable-mode benchmark, it derives closed-form expressions for delay and optimal resource allocation, and characterizes a breakaway structure in the delay-optimal assignment where the cloud-heavy mode is favored at low loads but the local-heavy mode is engaged as load grows. The dual-mode delay is decomposed into a tunable-mode delay plus an overhead term, yielding a universal lower bound that guides design; conditions for achieving or approaching this bound are identified. Through stability analysis and numerical evaluation, the paper provides design principles for throughput-efficient mode designs and reveals trade-offs between delay and throughput under different load regimes.

Abstract

We consider a computation offloading system where jobs are processed sequentially at a local server followed by a higher-capacity cloud server. The system offers two service modes, differing in how the processing is split between the servers. Our goal is to design an optimal policy for assigning jobs to service modes and partitioning server resources in order to minimize delay. We begin by characterizing the system's stability region and establishing design principles for service modes that maximize throughput. For any given job assignment strategy, we derive the optimal resource partitioning and present a closed-form expression for the resulting delay. Moreover, we establish that the delay-optimal assignment policy exhibits a distinct breakaway structure: at low system loads, it is optimal to route all jobs through a single service mode, whereas beyond a critical load threshold, jobs must be assigned across both modes. We conclude by validating these theoretical insights through numerical evaluation.

Delay Optimization in a Simple Offloading System: Extended Version

TL;DR

This work analyzes a two-stage computation offloading system with local and cloud servers and two service modes that differ in workload splitting. By introducing a canonical transformation and analyzing a tunable-mode benchmark, it derives closed-form expressions for delay and optimal resource allocation, and characterizes a breakaway structure in the delay-optimal assignment where the cloud-heavy mode is favored at low loads but the local-heavy mode is engaged as load grows. The dual-mode delay is decomposed into a tunable-mode delay plus an overhead term, yielding a universal lower bound that guides design; conditions for achieving or approaching this bound are identified. Through stability analysis and numerical evaluation, the paper provides design principles for throughput-efficient mode designs and reveals trade-offs between delay and throughput under different load regimes.

Abstract

We consider a computation offloading system where jobs are processed sequentially at a local server followed by a higher-capacity cloud server. The system offers two service modes, differing in how the processing is split between the servers. Our goal is to design an optimal policy for assigning jobs to service modes and partitioning server resources in order to minimize delay. We begin by characterizing the system's stability region and establishing design principles for service modes that maximize throughput. For any given job assignment strategy, we derive the optimal resource partitioning and present a closed-form expression for the resulting delay. Moreover, we establish that the delay-optimal assignment policy exhibits a distinct breakaway structure: at low system loads, it is optimal to route all jobs through a single service mode, whereas beyond a critical load threshold, jobs must be assigned across both modes. We conclude by validating these theoretical insights through numerical evaluation.

Paper Structure

This paper contains 25 sections, 12 theorems, 66 equations, 6 figures.

Key Result

theorem 1

For arrival rates $\lambda \in \Lambda_{\text{TM}}$, the delay-optimal service fraction parameter is: where The corresponding average delay under $f^*(\lambda)$ is:

Figures (6)

  • Figure 1: Dual-Mode System depicting job arrivals, probabilistic service mode assignment, and dedicated resource allocation at the servers.
  • Figure 2: Tunable-Mode System depicting job arrivals, and service times at each server.
  • Figure 3: Independent subsystems resulting from Poisson splitting in the dual-mode configuration.
  • Figure 4: Delay $T_{\text{DM}}(p;\lambda)$ vs. assignment parameter $p$ under optimal partitioning, shown for increasing system loads. $p^*$s indicate the optimal assignment parameter.
  • Figure 5: Optimal assignment $p^*(\lambda)$ vs. load $\rho$. Systems A and B exhibit breakaway transitions at $\rho_{\text{break}}^A$ and $\rho_{\text{break}}^B$. System C remains at $p^* = 1$.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Remark : Model generality
  • theorem 1
  • theorem 2: Stability Region
  • lemma 1: Achievability
  • lemma 2: Converse
  • definition 1
  • theorem 3: Optimal Resource Allocation
  • proposition 1
  • proposition 2
  • theorem 4: Redundant Mode
  • ...and 4 more