Table of Contents
Fetching ...

FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments

Li Wu, Walid A. Hanafy, Tarek Abdelzaher, David Irwin, Jesse Milzman, Prashant Shenoy

TL;DR

FailLite tackles failure resilience in resource-constrained edge environments by combining heterogeneous replication with a two-step failure handling strategy. It uses proactive warm replicas for highly critical applications and progressive loading of cold replicas for others, guided by ILP-based placement and a greedy progressive loading heuristic. Empirical results show an MTTR of $175.5$ ms with only a $0.6 ext{%}$ accuracy loss over 27 models, and large-scale simulations indicate substantial recovery gains under edge-site failures. This approach enables resilient, low-latency inference at the edge without full replication, making edge deployments more robust to hardware, network, and site outages.

Abstract

Model serving systems have become popular for deploying deep learning models for various latency-sensitive inference tasks. While traditional replication-based methods have been used for failure-resilient model serving in the cloud, such methods are often infeasible in edge environments due to significant resource constraints that preclude full replication. To address this problem, this paper presents FailLite, a failure-resilient model serving system that employs (i) a heterogeneous replication where failover models are smaller variants of the original model, (ii) an intelligent approach that uses warm replicas to ensure quick failover for critical applications while using cold replicas, and (iii) progressive failover to provide low mean time to recovery (MTTR) for the remaining applications. We implement a full prototype of our system and demonstrate its efficacy on an experimental edge testbed. Our results using 27 models show that FailLite can recover all failed applications with 175.5ms MTTR and only a 0.6% reduction in accuracy.

FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments

TL;DR

FailLite tackles failure resilience in resource-constrained edge environments by combining heterogeneous replication with a two-step failure handling strategy. It uses proactive warm replicas for highly critical applications and progressive loading of cold replicas for others, guided by ILP-based placement and a greedy progressive loading heuristic. Empirical results show an MTTR of ms with only a accuracy loss over 27 models, and large-scale simulations indicate substantial recovery gains under edge-site failures. This approach enables resilient, low-latency inference at the edge without full replication, making edge deployments more robust to hardware, network, and site outages.

Abstract

Model serving systems have become popular for deploying deep learning models for various latency-sensitive inference tasks. While traditional replication-based methods have been used for failure-resilient model serving in the cloud, such methods are often infeasible in edge environments due to significant resource constraints that preclude full replication. To address this problem, this paper presents FailLite, a failure-resilient model serving system that employs (i) a heterogeneous replication where failover models are smaller variants of the original model, (ii) an intelligent approach that uses warm replicas to ensure quick failover for critical applications while using cold replicas, and (iii) progressive failover to provide low mean time to recovery (MTTR) for the remaining applications. We implement a full prototype of our system and demonstrate its efficacy on an experimental edge testbed. Our results using 27 models show that FailLite can recover all failed applications with 175.5ms MTTR and only a 0.6% reduction in accuracy.

Paper Structure

This paper contains 25 sections, 2 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: Approaches to utilize a failover replica.
  • Figure 2: Accuracy-Resource Tradeoff (a) and Loading time (b) across DNN models.
  • Figure 3: Overview of FailLite's two-step approach. We note that the same shape represents the same applications, and size reflects the model variant.
  • Figure 4: Overview of FailLite Architecture.
  • Figure 5: FailLite behavior across different types of backups, showing the advantages of warm backups and our proposed progressive failover.
  • ...and 8 more figures