Table of Contents
Fetching ...

HyRES: A Hybrid Replication and Erasure Coding Approach to Data Storage

Daniel E. Lucani, Marcell Fehér

TL;DR

HyRES introduces a flexible hybrid redundancy framework that blends replication, MDS erasure coding, and local repairability to trade storage costs, robustness, and repair traffic in distributed storage. It provides a mathematical framework and closed-form results to compare HyRES against pure replication and MDS codes under network-size and loss-event considerations, and validates findings with simulations. The results demonstrate that HyRES can reduce overall storage costs and file-loss probability while maintaining competitive repair traffic, effectively generalizing prior hybrid approaches. This approach offers practical gains for managing hot and cold data in large-scale distributed storage systems and motivates network-size-aware design choices for redundancy strategies.

Abstract

Reliability in distributed storage systems has typically focused on the design and deployment of data replication or erasure coding techniques. Although some scenarios have considered the use of replication for hot data and erasure coding for cold data in the same system, each is designed in isolation. We propose HyRES, a hybrid scheme incorporates the best characteristics of each scheme, thus, resulting in additional design flexibility and better potential performance for the system. We show that HyRES generalizes previously proposed hybrid schemes. We characterize the theoretical performance of HyRES as well as that of replication and erasure coding considering the effects of the size of the storage networks. We validate our theoretical results using simulations. These results show that HyRES can yield simultaneously lower storage costs than replication, lower probabilities of file loss than replication and erasure coding with similar worst case performance, and even lower effective repair traffic than replication when considering the network size.

HyRES: A Hybrid Replication and Erasure Coding Approach to Data Storage

TL;DR

HyRES introduces a flexible hybrid redundancy framework that blends replication, MDS erasure coding, and local repairability to trade storage costs, robustness, and repair traffic in distributed storage. It provides a mathematical framework and closed-form results to compare HyRES against pure replication and MDS codes under network-size and loss-event considerations, and validates findings with simulations. The results demonstrate that HyRES can reduce overall storage costs and file-loss probability while maintaining competitive repair traffic, effectively generalizing prior hybrid approaches. This approach offers practical gains for managing hot and cold data in large-scale distributed storage systems and motivates network-size-aware design choices for redundancy strategies.

Abstract

Reliability in distributed storage systems has typically focused on the design and deployment of data replication or erasure coding techniques. Although some scenarios have considered the use of replication for hot data and erasure coding for cold data in the same system, each is designed in isolation. We propose HyRES, a hybrid scheme incorporates the best characteristics of each scheme, thus, resulting in additional design flexibility and better potential performance for the system. We show that HyRES generalizes previously proposed hybrid schemes. We characterize the theoretical performance of HyRES as well as that of replication and erasure coding considering the effects of the size of the storage networks. We validate our theoretical results using simulations. These results show that HyRES can yield simultaneously lower storage costs than replication, lower probabilities of file loss than replication and erasure coding with similar worst case performance, and even lower effective repair traffic than replication when considering the network size.

Paper Structure

This paper contains 9 sections, 10 theorems, 7 equations, 5 figures.

Key Result

Lemma 1

The worst case number of lost fragments to cause file loss in a HyRES $(R,e,1,k)$ and HyRES $(R,e+1,0,k)$ schemes is $R + e + 1$, i.e., no loss occurs if fewer than $R+e$ fragments are lost. This worst case loss occurs when $R$ copies of a single fragment are lost and all $e+1$ coded fragments are a

Figures (5)

  • Figure 1: Repair Traffic versus Storage Costs for a single node loss of various schemes. Files are split into $k=10$ in the case of replication, XORBAS, Reed-Solomon, and the proposed Hybrid scheme before introducing redundancy. MBR and MSR points of regenerating codes consider the equivalent redundancy of Reed-Solomon code.
  • Figure 2: Example of HyRES $(2,2,1,10)$ illustrating two replicas of the original fragments, two MDS erasure coded fragments, and a locally repairable linear combination of the two MDS erasure coded fragments
  • Figure 3: File loss probability for different node losses
  • Figure 4: Measured repair traffic for a single node loss in the network considering different network sizes
  • Figure 5: Average Repair Traffic per File and File Loss Probability for $N=30$ nodes for various node losses

Theorems & Definitions (16)

  • Definition 1
  • Lemma 1
  • proof : Proof Sketch
  • Lemma 2
  • proof : Proof Sketch
  • Lemma 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • ...and 6 more