Learning Failure-Inducing Models for Testing Software-Defined Networks
Raphaël Ollando, Seung Yeob Shin, Lionel C. Briand
TL;DR
The paper addresses robust testing of SDN controllers by jointly generating failure-inducing test data and learning interpretable failure-inducing models. It introduces FuzzSDN, an iterative framework that combines ML-guided fuzzing with rule-based learning (RIPPER) and planning to efficiently explore the OpenFlow input space. Empirical evaluation on ONOS and RYU across multiple network sizes shows FuzzSDN outperforms state-of-the-art fuzzers in producing failures and yields high-precision, high-recall failure models, with results aligning with literature on SDN failure conditions. The approach is scalable to larger networks and provides actionable diagnostics to guide fixes and validate changes in SDN controllers.
Abstract
Software-defined networks (SDN) enable flexible and effective communication systems that are managed by centralized software controllers. However, such a controller can undermine the underlying communication network of an SDN-based system and thus must be carefully tested. When an SDN-based system fails, in order to address such a failure, engineers need to precisely understand the conditions under which it occurs. In this article, we introduce a machine learning-guided fuzzing method, named FuzzSDN, aiming at both (1) generating effective test data leading to failures in SDN-based systems and (2) learning accurate failure-inducing models that characterize conditions under which such system fails. To our knowledge, no existing work simultaneously addresses these two objectives for SDNs. We evaluate FuzzSDN by applying it to systems controlled by two open-source SDN controllers. Further, we compare FuzzSDN with two state-of-the-art methods for fuzzing SDNs and two baselines for learning failure-inducing models. Our results show that (1) compared to the state-of-the-art methods, FuzzSDN generates at least 12 times more failures, within the same time budget, with a controller that is fairly robust to fuzzing and (2) our failure-inducing models have, on average, a precision of 98% and a recall of 86%, significantly outperforming the baselines.
