Table of Contents
Fetching ...

Erasure Coded Neural Network Inference via Fisher Averaging

Divyansh Jhunjhunwala, Neharika Jali, Gauri Joshi, Shiqiang Wang

TL;DR

This paper designs a method to code over neural networks, and proposes a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs of the given neural networks.

Abstract

Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations. A majority of cloud computing traffic now consists of inference on neural networks on shared resources where the response time of inference queries is also adversely affected by the same factors. However, current erasure coding techniques are largely focused on linear computations such as matrix-vector and matrix-matrix multiplications and hence do not work for the highly non-linear neural network functions. In this paper, we seek to design a method to code over neural networks, that is, given two or more neural network models, how to construct a coded model whose output is a linear combination of the outputs of the given neural networks. We formulate the problem as a KL barycenter problem and propose a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs. We conduct experiments to perform erasure coding over neural networks trained on real-world vision datasets and show that the accuracy of the decoded outputs using COIN is significantly higher than other baselines while being extremely compute-efficient.

Erasure Coded Neural Network Inference via Fisher Averaging

TL;DR

This paper designs a method to code over neural networks, and proposes a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs of the given neural networks.

Abstract

Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations. A majority of cloud computing traffic now consists of inference on neural networks on shared resources where the response time of inference queries is also adversely affected by the same factors. However, current erasure coding techniques are largely focused on linear computations such as matrix-vector and matrix-matrix multiplications and hence do not work for the highly non-linear neural network functions. In this paper, we seek to design a method to code over neural networks, that is, given two or more neural network models, how to construct a coded model whose output is a linear combination of the outputs of the given neural networks. We formulate the problem as a KL barycenter problem and propose a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs. We conduct experiments to perform erasure coding over neural networks trained on real-world vision datasets and show that the accuracy of the decoded outputs using COIN is significantly higher than other baselines while being extremely compute-efficient.
Paper Structure (8 sections, 13 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 8 sections, 13 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Illustration of how our proposed method COIN (see Algorithm \ref{['algo1']}) computes the coded model's parameters $\bm{\theta}_c$ such that its output $f_{\theta_c}({\bm x}) \approx \beta_1 f_{\theta_1}({\bm x}) + \beta_2 f_{\theta_2}({\bm x})$, a linear combination of the outputs of $f_{\theta_1}({\bm x})$ and $f_{\theta_2}({\bm x})$. Unlike ensemble distillation, the parameters $\bm{\theta}_c$ are computed without requiring training the model from scratch.
  • Figure 2: (a) shows the average normalized decoding accuracies computed on the train set and test set for the Ensemble Distillation baseline as a function of the number of optimization epochs when coding over networks trained on CIFAR-10 and MNIST. The accuracy on the train set reaches close to $100$ but accuracy on test set saturates close to $75$, implying overfitting. (b) shows the average normalized decoding accuracy for COIN, RegMean and Ensemble Distillation in the same setting as a function of the number of datapoints $P$. We see only a slight increase in the accuracy of COIN as we increase $P$, which demonstrates the data-efficiency of our approach.