Table of Contents
Fetching ...

WIO: Upload-Enabled Computational Storage on CXL SSDs

Yiwei Yang, Yanpeng Hu, Yusheng Zheng, Estabon Ramos, Jianchang Su, Andi Quinn, Wei Zhang

Abstract

The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persistent memory shortened the latency hierarchy, while computational storage devices pushed processing toward the data. Neither has displaced conventional NVMe SSDs at scale, largely due to programming complexity, ecosystem fragmentation, and thermal/power cliffs under sustained load. We argue that storage-side compute should be \emph{reversible}: computation should migrate dynamically between host and device based on runtime conditions. We present \sys, which realizes this principle on CXL SSDs by decomposing I/O-path logic into migratable \emph{storage actors} compiled to WebAssembly. Actors share state through coherent CXL.mem regions; an agility-aware scheduler migrates them via a zero-copy drain-and-switch protocol when thermal or power constraints arise. Our evaluation on an FPGA-based CXL SSD prototype and two production CSDs shows that \sys turns hard thermal cliffs into elastic trade-offs, achieving up to 2$\times$ throughput improvement and 3.75$\times$ write latency reduction without application modification.

WIO: Upload-Enabled Computational Storage on CXL SSDs

Abstract

The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persistent memory shortened the latency hierarchy, while computational storage devices pushed processing toward the data. Neither has displaced conventional NVMe SSDs at scale, largely due to programming complexity, ecosystem fragmentation, and thermal/power cliffs under sustained load. We argue that storage-side compute should be \emph{reversible}: computation should migrate dynamically between host and device based on runtime conditions. We present \sys, which realizes this principle on CXL SSDs by decomposing I/O-path logic into migratable \emph{storage actors} compiled to WebAssembly. Actors share state through coherent CXL.mem regions; an agility-aware scheduler migrates them via a zero-copy drain-and-switch protocol when thermal or power constraints arise. Our evaluation on an FPGA-based CXL SSD prototype and two production CSDs shows that \sys turns hard thermal cliffs into elastic trade-offs, achieving up to 2 throughput improvement and 3.75 write latency reduction without application modification.

Paper Structure

This paper contains 43 sections, 16 figures, 1 table.

Figures (16)

  • Figure 1: Sustained write throughput (solid) and device temperature (dotted) over time. The CXL SSD with migration maintains throughput; Samsung and ScaleFlux suffer 50--60% drops from thermal throttling.
  • Figure 2: Sub-512 B I/O performance. CXL SSD achieves 5.4 $\mu$s latency at 8 B writes versus 38 $\mu$s (SmartSSD) and 80.6 $\mu$s (ScaleFlux).
  • Figure 3: WIO architecture overview with host domain, device domain, and coherent PMR shared region.
  • Figure 4: CXL SSD RTL datapath.
  • Figure 5: Evaluation breakdown of WIO across byte-addressable access, PMR behavior, queue scaling, WASM overhead, scheduler telemetry, and thermal stability.
  • ...and 11 more figures