Table of Contents
Fetching ...

Billion-files File Systems (BfFS): A Comparison

Sohail Shaikh

TL;DR

The study investigates how well popular Linux local filesystems (EXT4, XFS, BtrFS, F2FS, ZFS) scale to one billion files, assessing throughput, metadata overhead, disk utilization, and potential performance deterioration. It introduces a purpose-built benchmark with Creator/Reader applications (implemented in Java and C) and analyzes I/O latency across Linux VFS layers, using EXT4 as a baseline. Key findings show EXT4 and XFS handle large-scale workloads with inode tuning, BtrFS struggles with reads at extreme scales, F2FS is inode-constrained on HDDs, and ZFS delivers fast writes but high read latency and longer run times. The work provides practical guidance for system designers on filesystem choice for high-file-count workloads and offers a reproducible benchmarking framework for future evaluations.

Abstract

As the volume of data being produced is increasing at an exponential rate that needs to be processed quickly, it is reasonable that the data needs to be available very close to the compute devices to reduce transfer latency. Due to this need, local filesystems are getting close attention to understand their inner workings, performance, and more importantly their limitations. This study analyzes few popular Linux filesystems: EXT4, XFS, BtrFS, ZFS, and F2FS by creating, storing, and then reading back one billion files from the local filesystem. The study also captured and analyzed read/write throughput, storage blocks usage, disk space utilization and overheads, and other metrics useful for system designers and integrators. Furthermore, the study explored other side effects such as filesystem performance degradation during and after these large numbers of files and folders are created.

Billion-files File Systems (BfFS): A Comparison

TL;DR

The study investigates how well popular Linux local filesystems (EXT4, XFS, BtrFS, F2FS, ZFS) scale to one billion files, assessing throughput, metadata overhead, disk utilization, and potential performance deterioration. It introduces a purpose-built benchmark with Creator/Reader applications (implemented in Java and C) and analyzes I/O latency across Linux VFS layers, using EXT4 as a baseline. Key findings show EXT4 and XFS handle large-scale workloads with inode tuning, BtrFS struggles with reads at extreme scales, F2FS is inode-constrained on HDDs, and ZFS delivers fast writes but high read latency and longer run times. The work provides practical guidance for system designers on filesystem choice for high-file-count workloads and offers a reproducible benchmarking framework for future evaluations.

Abstract

As the volume of data being produced is increasing at an exponential rate that needs to be processed quickly, it is reasonable that the data needs to be available very close to the compute devices to reduce transfer latency. Due to this need, local filesystems are getting close attention to understand their inner workings, performance, and more importantly their limitations. This study analyzes few popular Linux filesystems: EXT4, XFS, BtrFS, ZFS, and F2FS by creating, storing, and then reading back one billion files from the local filesystem. The study also captured and analyzed read/write throughput, storage blocks usage, disk space utilization and overheads, and other metrics useful for system designers and integrators. Furthermore, the study explored other side effects such as filesystem performance degradation during and after these large numbers of files and folders are created.
Paper Structure (14 sections, 7 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Linux VFS Architecture
  • Figure 2: EXT4 Internal Structures
  • Figure 3: File Size Distribution
  • Figure 4: File Write Speed Distribution (in $\mu$sec)
  • Figure 5: File Read Speed Distribution (in $\mu$sec)
  • ...and 2 more figures