Table of Contents
Fetching ...

Fletch: File-System Metadata Caching in Programmable Switches

Qingxiu Liu, Jiazhen Cai, Siyuan Sheng, Yuhui Chen, Lu Tang, Zhirong Shen, Patrick P. C. Lee

TL;DR

Fletch tackles the bottleneck of distributed file-system metadata by moving metadata caching into programmable switches. It introduces path-aware cache management, multi-level read-write locking, and local hash collision resolution to handle the unique path semantics of file systems while obeying switch resource constraints. Evaluation on a Tofino-based testbed shows substantial throughput improvements over vanilla HDFS and meaningful gains when combined with client-side caching, with reasonable latency and resource usage. The work demonstrates that in-switch caching can significantly improve metadata throughput in large-scale FS deployments without sacrificing correctness or consistency, offering a practical complement to existing client-side strategies.

Abstract

Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but incurs significant overhead and complexity in maintaining cache consistency when the number of clients increases. We explore caching in programmable switches by serving file-system metadata requests from multiple clients on the switch data plane. Despite prior efforts on in-switch key-value caching, they fail to address the path dependencies specific to file-system semantics. We propose Fletch, an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. We implement Fletch atop Hadoop HDFS and evaluate it on a Tofino-switch testbed using real-world file-system metadata workloads. Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with additional throughput gains of up to 139.6%. It also incurs low latencies and limited switch resource usage.

Fletch: File-System Metadata Caching in Programmable Switches

TL;DR

Fletch tackles the bottleneck of distributed file-system metadata by moving metadata caching into programmable switches. It introduces path-aware cache management, multi-level read-write locking, and local hash collision resolution to handle the unique path semantics of file systems while obeying switch resource constraints. Evaluation on a Tofino-based testbed shows substantial throughput improvements over vanilla HDFS and meaningful gains when combined with client-side caching, with reasonable latency and resource usage. The work demonstrates that in-switch caching can significantly improve metadata throughput in large-scale FS deployments without sacrificing correctness or consistency, offering a practical complement to existing client-side strategies.

Abstract

Fast and scalable metadata management across multiple metadata servers is crucial for distributed file systems to handle numerous files and directories. Client-side caching of frequently accessed metadata can mitigate server loads, but incurs significant overhead and complexity in maintaining cache consistency when the number of clients increases. We explore caching in programmable switches by serving file-system metadata requests from multiple clients on the switch data plane. Despite prior efforts on in-switch key-value caching, they fail to address the path dependencies specific to file-system semantics. We propose Fletch, an in-switch file-system metadata caching framework that leverages programmable switches to serve file-system metadata requests from multiple clients directly in the switch data plane. Unlike prior in-switch key-value caching approaches, Fletch addresses file-system-specific path dependencies under stringent switch resource constraints. We implement Fletch atop Hadoop HDFS and evaluate it on a Tofino-switch testbed using real-world file-system metadata workloads. Fletch achieves up to 181.6% higher throughput than vanilla HDFS and complements client-side caching with additional throughput gains of up to 139.6%. It also incurs low latencies and limited switch resource usage.

Paper Structure

This paper contains 22 sections, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Data plane of a programmable switch.
  • Figure 2: Fletch's architecture.
  • Figure 3: Example of cache admission and eviction workflows.
  • Figure 4: Example of processing a read request under multi-level read-write locking.
  • Figure 5: Example of token generation and distribution. Note that the controller also assigns tokens for $p$'s uncached ancestors in Steps 2-4. We omit the details for brevity.
  • ...and 9 more figures