Table of Contents
Fetching ...

Swift: Rethinking RDMA Control Plane for Elastic Computing

Junxue Zhang, Han Tian, Xinyang Huang, Wenxue Li, Kaiqiang Xu, Dian Shen, Yong Wang, Kai Chen

TL;DR

This work addresses RDMA control-plane bottlenecks in elastic computing where frequent task startups magnify setup costs. It introduces Swift, a simple cache-optimized user-space RDMA control plane that co-designs with a serverless framework and supports fork-based sharing. Key contributions show that libibverbs can be accelerated via caching to dramatically reduce setup times, and that RDMA resources can be shared across processes using fork without incurring prohibitive data-plane penalties; Swift achieves comparable control-plane performance to kernel-based KRCore while delivering substantial data-plane throughput and latency improvements ($30.56-46.50\%$ higher throughput and $18.55-37.21\%$ lower latency). The work demonstrates practical compatibility across Linux kernels and provides a viable path to high-performance RDMA-enabled elastic computing in real-world serverless environments.

Abstract

Elastic computing enables dynamic scaling to meet workload demands, and Remote Direct Memory Access (RDMA) enhances this by providing high-throughput, low-latency network communication. However, integrating RDMA into elastic computing remains a challenge, particularly in control plane operations for RDMA connection setup. This paper revisits the assumptions of prior work on high-performance RDMA for elastic computing, and reveals that extreme microsecond-level control plane optimizations are often unnecessary. By challenging the conventional beliefs on the slowness of user-space RDMA control plane and the difficulty of user-space RDMA resource sharing, we uncover new design opportunities. Our key insight is that user-space RDMA connection setup can be significantly improved with caching, while RDMA resources can be efficiently shared among processes using fork. In light of this, we propose Swift, a simple yet effective solution that co-designs RDMA with a serverless framework to optimize performance for elastic computing. At its very core, Swift handles cold and warm serverless requests by swiftly initializing the RDMA control plane with cache-optimized libibverbs, and manages fork requests by leveraging the RDMA's fork capability. Implemented with OpenWhisk, Swift delivers 30.56-46.50% higher average throughput and 18.55-37.21% lower latency, at a cost of 6.5% control plane overhead, compared to prior solutions.

Swift: Rethinking RDMA Control Plane for Elastic Computing

TL;DR

This work addresses RDMA control-plane bottlenecks in elastic computing where frequent task startups magnify setup costs. It introduces Swift, a simple cache-optimized user-space RDMA control plane that co-designs with a serverless framework and supports fork-based sharing. Key contributions show that libibverbs can be accelerated via caching to dramatically reduce setup times, and that RDMA resources can be shared across processes using fork without incurring prohibitive data-plane penalties; Swift achieves comparable control-plane performance to kernel-based KRCore while delivering substantial data-plane throughput and latency improvements ( higher throughput and lower latency). The work demonstrates practical compatibility across Linux kernels and provides a viable path to high-performance RDMA-enabled elastic computing in real-world serverless environments.

Abstract

Elastic computing enables dynamic scaling to meet workload demands, and Remote Direct Memory Access (RDMA) enhances this by providing high-throughput, low-latency network communication. However, integrating RDMA into elastic computing remains a challenge, particularly in control plane operations for RDMA connection setup. This paper revisits the assumptions of prior work on high-performance RDMA for elastic computing, and reveals that extreme microsecond-level control plane optimizations are often unnecessary. By challenging the conventional beliefs on the slowness of user-space RDMA control plane and the difficulty of user-space RDMA resource sharing, we uncover new design opportunities. Our key insight is that user-space RDMA connection setup can be significantly improved with caching, while RDMA resources can be efficiently shared among processes using fork. In light of this, we propose Swift, a simple yet effective solution that co-designs RDMA with a serverless framework to optimize performance for elastic computing. At its very core, Swift handles cold and warm serverless requests by swiftly initializing the RDMA control plane with cache-optimized libibverbs, and manages fork requests by leveraging the RDMA's fork capability. Implemented with OpenWhisk, Swift delivers 30.56-46.50% higher average throughput and 18.55-37.21% lower latency, at a cost of 6.5% control plane overhead, compared to prior solutions.

Paper Structure

This paper contains 33 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Critical Path of RDMA in Elastic Computing: Unlike conventional RDMA use cases, elastic computing demands RDMA control plane setup each time a task is launched.
  • Figure 2: Workflow of libibverbs API calls during RDMA control plane setup, along with their execution times in both user space and kernel space.
  • Figure 3: Workflow of proposed caching mechanism.
  • Figure 4: Workflow of Swift. The workflow of each request is marked with a different color.
  • Figure 5: Relationship of tables used in Swift.
  • ...and 5 more figures