Delay Optimization in a Simple Offloading System: Extended Version
Darin Jeff, Eytan Modiano
TL;DR
This work analyzes a two-stage computation offloading system with local and cloud servers and two service modes that differ in workload splitting. By introducing a canonical transformation and analyzing a tunable-mode benchmark, it derives closed-form expressions for delay and optimal resource allocation, and characterizes a breakaway structure in the delay-optimal assignment where the cloud-heavy mode is favored at low loads but the local-heavy mode is engaged as load grows. The dual-mode delay is decomposed into a tunable-mode delay plus an overhead term, yielding a universal lower bound that guides design; conditions for achieving or approaching this bound are identified. Through stability analysis and numerical evaluation, the paper provides design principles for throughput-efficient mode designs and reveals trade-offs between delay and throughput under different load regimes.
Abstract
We consider a computation offloading system where jobs are processed sequentially at a local server followed by a higher-capacity cloud server. The system offers two service modes, differing in how the processing is split between the servers. Our goal is to design an optimal policy for assigning jobs to service modes and partitioning server resources in order to minimize delay. We begin by characterizing the system's stability region and establishing design principles for service modes that maximize throughput. For any given job assignment strategy, we derive the optimal resource partitioning and present a closed-form expression for the resulting delay. Moreover, we establish that the delay-optimal assignment policy exhibits a distinct breakaway structure: at low system loads, it is optimal to route all jobs through a single service mode, whereas beyond a critical load threshold, jobs must be assigned across both modes. We conclude by validating these theoretical insights through numerical evaluation.
