Table of Contents
Fetching ...

Telepathic Datacenters: Fast RPCs using Shared CXL Memory

Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson

TL;DR

RPCool introduces a fast, secure RPC framework that exploits CXL-based shared memory to pass native pointer-rich data without serialization, while enforcing safety through lightweight sandboxes and seals. It integrates a global orchestrator, per-connection heaps, and a daemon/kernel stack to manage channels, leases, and memory, and it provides seamless RDMA fallback to scale beyond rack-scale CXL. The paper demonstrates substantial latency reductions and competitive throughput across microbenchmarks and real applications (Memcached, MongoDB, CoolDB, and DeathStarBench), with concrete improvements over RDMA and CXL baselines and robust handling of safety, failures, and memory management. Overall, RPCool enables fast, secure cross-host RPCs in CXL-enabled datacenters while transparently scaling via RDMA when necessary, making pointer-rich data sharing feasible at datacenter scale with controlled safety guarantees.

Abstract

Datacenter applications often rely on remote procedure calls (RPCs) for fast, efficient, and secure communication. However, RPCs are slow, inefficient, and hard to use as they require expensive serialization and compression to communicate over a packetized serial network link. Compute Express Link 3.0 (CXL) offers an alternative solution, allowing applications to share data using a cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities. RPCool avoids serialization by passing pointers to data structures in shared memory. While avoiding serialization is useful, directly sharing pointer-rich data eliminates the isolation that copying data over traditional networks provides, leaving the receiver vulnerable to invalid pointers and concurrent updates to shared data by the sender. RPCool restores this safety with careful and efficient management of memory permissions. Another significant challenge with CXL shared memory capabilities is that they are unlikely to scale to an entire datacenter. RPCool addresses this by falling back to RDMA-based communication. Overall, RPCool reduces the round-trip latency by 1.93$\times$ and 7.2$\times$ compared to state-of-the-art RDMA and CXL-based RPC mechanisms, respectively. Moreover, RPCool performs either comparably or better than other RPC mechanisms across a range of workloads.

Telepathic Datacenters: Fast RPCs using Shared CXL Memory

TL;DR

RPCool introduces a fast, secure RPC framework that exploits CXL-based shared memory to pass native pointer-rich data without serialization, while enforcing safety through lightweight sandboxes and seals. It integrates a global orchestrator, per-connection heaps, and a daemon/kernel stack to manage channels, leases, and memory, and it provides seamless RDMA fallback to scale beyond rack-scale CXL. The paper demonstrates substantial latency reductions and competitive throughput across microbenchmarks and real applications (Memcached, MongoDB, CoolDB, and DeathStarBench), with concrete improvements over RDMA and CXL baselines and robust handling of safety, failures, and memory management. Overall, RPCool enables fast, secure cross-host RPCs in CXL-enabled datacenters while transparently scaling via RDMA when necessary, making pointer-rich data sharing feasible at datacenter scale with controlled safety guarantees.

Abstract

Datacenter applications often rely on remote procedure calls (RPCs) for fast, efficient, and secure communication. However, RPCs are slow, inefficient, and hard to use as they require expensive serialization and compression to communicate over a packetized serial network link. Compute Express Link 3.0 (CXL) offers an alternative solution, allowing applications to share data using a cache-coherent, shared-memory interface across clusters of machines. RPCool is a new framework that exploits CXL's shared memory capabilities. RPCool avoids serialization by passing pointers to data structures in shared memory. While avoiding serialization is useful, directly sharing pointer-rich data eliminates the isolation that copying data over traditional networks provides, leaving the receiver vulnerable to invalid pointers and concurrent updates to shared data by the sender. RPCool restores this safety with careful and efficient management of memory permissions. Another significant challenge with CXL shared memory capabilities is that they are unlikely to scale to an entire datacenter. RPCool addresses this by falling back to RDMA-based communication. Overall, RPCool reduces the round-trip latency by 1.93 and 7.2 compared to state-of-the-art RDMA and CXL-based RPC mechanisms, respectively. Moreover, RPCool performs either comparably or better than other RPC mechanisms across a range of workloads.
Paper Structure (48 sections, 13 figures, 1 table)

This paper contains 48 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: RTT comparison of several communication protocols.
  • Figure 2: Expected CXL v3+ in the datacenter alongside RDMA.
  • Figure 3: RPCool's System Architecture.
  • Figure 4: Channels, connections, and heaps in RPCool.
  • Figure 5: Two possible failure scenarios in RPCool. (a) Server crash results in an orphaned heap. (b) Client left with heaps after multiple servers crash.
  • ...and 8 more figures