Analyzing Configuration Dependencies of File Systems
Tabassum Mahmud, Om Rameshwar Gatla, Duo Zhang, Carson Love, Ryan Bumann, Varun S Girimaji, Mai Zheng
TL;DR
The paper tackles the problem of configuration-related complexity in storage systems by empirically uncovering multilevel configuration dependencies across Ext4, XFS, and ZFS, and by building ConfD, an extensible framework with a core for dependency extraction and a plugin suite for detection. It leverages metadata-assisted taint analysis on LLVM IR to connect parameters across components via shared FS metadata, derives a taxonomy of Self Dependency, Cross-Parameter Dependency, and Cross-Component Dependency, and generates dependency-guided configuration states to enable targeted testing and regression analysis. The results show ConfD can extract around 160 multilevel dependencies with a low false-positive rate and that dependency-guided plugins uncover a broad range of issues (specification gaps, misconfigurations, and regression failures) while improving false-positive rates in regression testing by substantial margins. The work demonstrates the practical value of dependency-aware configuration analysis for FS ecosystems and suggests extensions to other storage systems (e.g., WiredTiger) and databases, with potential integration into CI pipelines and broader language support through LLVM.
Abstract
File systems play an essential role in modern society for managing precious data. To meet diverse needs, they often support many configuration parameters. Such flexibility comes at the price of additional complexity which can lead to subtle configuration-related issues. To address this challenge, we study the configuration-related issues of two major file systems (i.e., Ext4 and XFS) in depth, and identify a prevalent pattern called multilevel configuration dependencies. Based on the study, we build an extensible tool called ConfD to extract the dependencies automatically, and create a set of plugins to address different configuration-related issues. Our experiments on Ext4, XFS and a modern copy-on-write file system (i.e., ZFS) show that ConfD was able to extract 160 configuration dependencies for the file systems with a low false positive rate. Moreover, the dependency-guided plugins can identify various configuration issues (e.g., mishandling of configurations, regression test failures induced by valid configurations). In addition, we also explore the applicability of ConfD on a popular storage engine (i.e., WiredTiger). We hope that this comprehensive analysis of configuration dependencies of storage systems can shed light on addressing configuration-related challenges for the system community in general.
