Billion-files File Systems (BfFS): A Comparison
Sohail Shaikh
TL;DR
The study investigates how well popular Linux local filesystems (EXT4, XFS, BtrFS, F2FS, ZFS) scale to one billion files, assessing throughput, metadata overhead, disk utilization, and potential performance deterioration. It introduces a purpose-built benchmark with Creator/Reader applications (implemented in Java and C) and analyzes I/O latency across Linux VFS layers, using EXT4 as a baseline. Key findings show EXT4 and XFS handle large-scale workloads with inode tuning, BtrFS struggles with reads at extreme scales, F2FS is inode-constrained on HDDs, and ZFS delivers fast writes but high read latency and longer run times. The work provides practical guidance for system designers on filesystem choice for high-file-count workloads and offers a reproducible benchmarking framework for future evaluations.
Abstract
As the volume of data being produced is increasing at an exponential rate that needs to be processed quickly, it is reasonable that the data needs to be available very close to the compute devices to reduce transfer latency. Due to this need, local filesystems are getting close attention to understand their inner workings, performance, and more importantly their limitations. This study analyzes few popular Linux filesystems: EXT4, XFS, BtrFS, ZFS, and F2FS by creating, storing, and then reading back one billion files from the local filesystem. The study also captured and analyzed read/write throughput, storage blocks usage, disk space utilization and overheads, and other metrics useful for system designers and integrators. Furthermore, the study explored other side effects such as filesystem performance degradation during and after these large numbers of files and folders are created.
