Table of Contents
Fetching ...

Approximations to Study the Impact of the Service Discipline in Systems with Redundancy

Nicolas Gast, Benny van Houdt

TL;DR

The paper addresses how service discipline shapes queue length in large redundancy-d systems. It introduces a discipline-agnostic mean-field model plus refined pair- and triplet-approximation frameworks to quantify discipline effects across PS, FCFS, LPS(K), and LCFS. Key contributions include deriving transient and fixed-point ODEs, developing a dynamic-graph RED-PS model, and validating that FCFS yields the shortest queue lengths at high load, with substantial differences from LCFS and PS in practice. The work provides accurate, tractable tools for analyzing discipline impact in redundancy networks, enabling more informed design of load-balancing and scheduling policies in large-scale systems.

Abstract

As job redundancy has been recognized as an effective means to improve performance of large-scale computer systems, queueing systems with redundancy have been studied by various authors. Existing results include methods to compute the queue length distribution and response time but only when the service discipline is First-Come-First-Served (FCFS). For other service disciplines, such as Processor Sharing (PS), or Last-Come-First-Served (LCFS), only the stability conditions are known. In this paper we develop the first methods to approximate the queue length distribution in a queueing system with redundancy under various service disciplines. We focus on a system with exponential job sizes, i.i.d. copies, and a large number of servers. We first derive a mean field approximation that is independent of the scheduling policy. In order to study the impact of service discipline, we then derive refinements of this approximation to specific scheduling policies. In the case of Processor Sharing, we provide a pair and a triplet approximation. The pair approximation can be regarded as a refinement of the classic mean field approximation and takes the service discipline into account, while the triplet approximation further refines the pair approximation. We also develop a pair approximation for three other service disciplines: First-Come-First-Served, Limited Processor Sharing and Last-Come-First-Served. We present numerical evidence that shows that all the approximations presented in the paper are highly accurate, but that none of them are asymptotically exact (as the number of servers goes to infinity). This makes these approximations suitable to study the impact of the service discipline on the queue length distribution. Our results show that FCFS yields the shortest queue length, and that the differences are more substantial at higher loads.

Approximations to Study the Impact of the Service Discipline in Systems with Redundancy

TL;DR

The paper addresses how service discipline shapes queue length in large redundancy-d systems. It introduces a discipline-agnostic mean-field model plus refined pair- and triplet-approximation frameworks to quantify discipline effects across PS, FCFS, LPS(K), and LCFS. Key contributions include deriving transient and fixed-point ODEs, developing a dynamic-graph RED-PS model, and validating that FCFS yields the shortest queue lengths at high load, with substantial differences from LCFS and PS in practice. The work provides accurate, tractable tools for analyzing discipline impact in redundancy networks, enabling more informed design of load-balancing and scheduling policies in large-scale systems.

Abstract

As job redundancy has been recognized as an effective means to improve performance of large-scale computer systems, queueing systems with redundancy have been studied by various authors. Existing results include methods to compute the queue length distribution and response time but only when the service discipline is First-Come-First-Served (FCFS). For other service disciplines, such as Processor Sharing (PS), or Last-Come-First-Served (LCFS), only the stability conditions are known. In this paper we develop the first methods to approximate the queue length distribution in a queueing system with redundancy under various service disciplines. We focus on a system with exponential job sizes, i.i.d. copies, and a large number of servers. We first derive a mean field approximation that is independent of the scheduling policy. In order to study the impact of service discipline, we then derive refinements of this approximation to specific scheduling policies. In the case of Processor Sharing, we provide a pair and a triplet approximation. The pair approximation can be regarded as a refinement of the classic mean field approximation and takes the service discipline into account, while the triplet approximation further refines the pair approximation. We also develop a pair approximation for three other service disciplines: First-Come-First-Served, Limited Processor Sharing and Last-Come-First-Served. We present numerical evidence that shows that all the approximations presented in the paper are highly accurate, but that none of them are asymptotically exact (as the number of servers goes to infinity). This makes these approximations suitable to study the impact of the service discipline on the queue length distribution. Our results show that FCFS yields the shortest queue length, and that the differences are more substantial at higher loads.
Paper Structure (50 sections, 1 theorem, 65 equations, 6 figures, 15 tables)

This paper contains 50 sections, 1 theorem, 65 equations, 6 figures, 15 tables.

Key Result

Theorem 1

The set of ODEs given by eq:indepq has a unique fixed point given by where $\bar{q}$ is the unique solution on $(0,\infty)$ of the equation where $\gamma(s,x) = \int_0^x t^{s-1} e^{-t} dt$ is the lower incomplete gamma function and where the right-hand side is increasing in $\bar{q}$.

Figures (6)

  • Figure 1: Example of a dynamic random graph for a system with $n=10$ servers. The labels on the servers indicate degrees. A new edge is added between two nodes at rate $2\lambda$. An edge connecting two nodes $(u,v)$ of degrees $d_t(u)$ and $d_t(v)$ disapears at rate $1/d_t(u)+1/d_t(v)$.
  • Figure 2: Average queue length as a function of the number of servers $n$: we compare the numbers obtained by simulation (for finite $n$) to the pair and triplet approximations.
  • Figure 3: Queue length distributions of the various policies (computed by the pair approximations).
  • Figure 4: Average queue length as a function of $\lambda$. Unless specified otherwise, all numbers are computed by using the pair approximations that we derive in the paper..
  • Figure 5: Rate at which the buddy replica disappears as a function of the state of the first job (as a function of the queue length for PS and of the position of the job or the queue length for LCFS or LCFS).
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof