Reliability and Availability in Virtualized Networks: A Survey on Standards, Modeling Approaches, and Research Challenges
Mario Di Mauro, Walter Cerroni, Fabio Postiglione, Massimo Tornatore, Kishor S. Trivedi
TL;DR
This survey articulates how reliability and availability are critical for virtualized networks powered by NFV, detailing ETSI NFV-REL standard guidance and practical modeling approaches. It classifies modeling formalisms into non-state-space, state-space, and multi-level categories, illustrating their applicability to VNFs, SFCs, NFVI, and MANO via concrete examples. It also inventories a wide range of modeling tools (e.g., SHARPE, TimeNET, SRNs, SANs) and discusses the trade-offs, strengths, and limitations of each formalism for real-world NFV environments. The paper concludes with open challenges—scalability, fault tolerance, security, energy efficiency, and automation—and argues for integrated, ETSI-aligned, tool-supported modeling to advance dependable virtualized networks.
Abstract
The rise of Network Function Virtualization (NFV) has transformed network infrastructures by replacing fixed hardware with software-based Virtualized Network Functions (VNFs), enabling greater agility, scalability, and cost efficiency. Virtualization increases the distribution of system components and introduces stronger interdependencies. As a result, failures become harder to predict, monitor, and manage compared to traditional monolithic networks. Reliability, i.e. the ability of a system to perform regularly under specified conditions, and availability, i.e. the probability of a system of being ready to use, are critical requirements that must be guaranteed to maintain seamless network operations. Accurate modeling of these aspects is crucial for designing robust, fault-tolerant virtualized systems that can withstand service disruptions. This survey focuses on reliability and availability attributes of virtualized networks from a modeling perspective. After introducing the NFV architecture and basic definitions, we discuss the standardization efforts of the European Telecommunications Standards Institute (ETSI), which provides guidelines and recommendations through a series of standard documents focusing on reliability and availability. Next, we explore several formalisms proposed in the literature for characterizing reliability and availability, with a focus on their application to modeling the failure and repair behavior of virtualized networks through practical examples. Then, we overview numerous references demonstrating how different authors adopt specific methods to characterize reliability and/or availability of virtualized systems. Moreover, we present a selection of the most valuable software tools that support modeling of reliable virtualized networks. Finally, we discuss a set of open problems with the aim to encourage readers to explore further advances in this field.
