Table of Contents
Fetching ...

Reliability and Availability in Virtualized Networks: A Survey on Standards, Modeling Approaches, and Research Challenges

Mario Di Mauro, Walter Cerroni, Fabio Postiglione, Massimo Tornatore, Kishor S. Trivedi

TL;DR

This survey articulates how reliability and availability are critical for virtualized networks powered by NFV, detailing ETSI NFV-REL standard guidance and practical modeling approaches. It classifies modeling formalisms into non-state-space, state-space, and multi-level categories, illustrating their applicability to VNFs, SFCs, NFVI, and MANO via concrete examples. It also inventories a wide range of modeling tools (e.g., SHARPE, TimeNET, SRNs, SANs) and discusses the trade-offs, strengths, and limitations of each formalism for real-world NFV environments. The paper concludes with open challenges—scalability, fault tolerance, security, energy efficiency, and automation—and argues for integrated, ETSI-aligned, tool-supported modeling to advance dependable virtualized networks.

Abstract

The rise of Network Function Virtualization (NFV) has transformed network infrastructures by replacing fixed hardware with software-based Virtualized Network Functions (VNFs), enabling greater agility, scalability, and cost efficiency. Virtualization increases the distribution of system components and introduces stronger interdependencies. As a result, failures become harder to predict, monitor, and manage compared to traditional monolithic networks. Reliability, i.e. the ability of a system to perform regularly under specified conditions, and availability, i.e. the probability of a system of being ready to use, are critical requirements that must be guaranteed to maintain seamless network operations. Accurate modeling of these aspects is crucial for designing robust, fault-tolerant virtualized systems that can withstand service disruptions. This survey focuses on reliability and availability attributes of virtualized networks from a modeling perspective. After introducing the NFV architecture and basic definitions, we discuss the standardization efforts of the European Telecommunications Standards Institute (ETSI), which provides guidelines and recommendations through a series of standard documents focusing on reliability and availability. Next, we explore several formalisms proposed in the literature for characterizing reliability and availability, with a focus on their application to modeling the failure and repair behavior of virtualized networks through practical examples. Then, we overview numerous references demonstrating how different authors adopt specific methods to characterize reliability and/or availability of virtualized systems. Moreover, we present a selection of the most valuable software tools that support modeling of reliable virtualized networks. Finally, we discuss a set of open problems with the aim to encourage readers to explore further advances in this field.

Reliability and Availability in Virtualized Networks: A Survey on Standards, Modeling Approaches, and Research Challenges

TL;DR

This survey articulates how reliability and availability are critical for virtualized networks powered by NFV, detailing ETSI NFV-REL standard guidance and practical modeling approaches. It classifies modeling formalisms into non-state-space, state-space, and multi-level categories, illustrating their applicability to VNFs, SFCs, NFVI, and MANO via concrete examples. It also inventories a wide range of modeling tools (e.g., SHARPE, TimeNET, SRNs, SANs) and discusses the trade-offs, strengths, and limitations of each formalism for real-world NFV environments. The paper concludes with open challenges—scalability, fault tolerance, security, energy efficiency, and automation—and argues for integrated, ETSI-aligned, tool-supported modeling to advance dependable virtualized networks.

Abstract

The rise of Network Function Virtualization (NFV) has transformed network infrastructures by replacing fixed hardware with software-based Virtualized Network Functions (VNFs), enabling greater agility, scalability, and cost efficiency. Virtualization increases the distribution of system components and introduces stronger interdependencies. As a result, failures become harder to predict, monitor, and manage compared to traditional monolithic networks. Reliability, i.e. the ability of a system to perform regularly under specified conditions, and availability, i.e. the probability of a system of being ready to use, are critical requirements that must be guaranteed to maintain seamless network operations. Accurate modeling of these aspects is crucial for designing robust, fault-tolerant virtualized systems that can withstand service disruptions. This survey focuses on reliability and availability attributes of virtualized networks from a modeling perspective. After introducing the NFV architecture and basic definitions, we discuss the standardization efforts of the European Telecommunications Standards Institute (ETSI), which provides guidelines and recommendations through a series of standard documents focusing on reliability and availability. Next, we explore several formalisms proposed in the literature for characterizing reliability and availability, with a focus on their application to modeling the failure and repair behavior of virtualized networks through practical examples. Then, we overview numerous references demonstrating how different authors adopt specific methods to characterize reliability and/or availability of virtualized systems. Moreover, we present a selection of the most valuable software tools that support modeling of reliable virtualized networks. Finally, we discuss a set of open problems with the aim to encourage readers to explore further advances in this field.

Paper Structure

This paper contains 58 sections, 11 equations, 19 figures, 9 tables.

Figures (19)

  • Figure 1: Distribution of years in the selected references.
  • Figure 2: Paper Organization.
  • Figure 3: The standard ETSI-NFV architecture (on the left) with main connected paradigms, including: network slicing (top-right part), service function chaining (middle-right part), software defined networking (bottom-right part).
  • Figure 4: Availability and Reliability as attributes of Dependability according to the taxonomy in avizienis.
  • Figure 5: Mean Time to Failure (MTTF) and Mean Time to Repair (MTTR).
  • ...and 14 more figures