Table of Contents
Fetching ...

FlexBSO: Flexible Block Storage Offload for Datacenters

Vojtech Aschenbrenner, John Shawger, Sadman Sakib

TL;DR

FlexBSO tackles the overhead, rigidity, and VM exit penalties of traditional block-device virtualization by offloading the storage stack to a Bluefield-2 SmartNIC using NVIDIA SNAP and SR-IOV to present NVMe devices directly to guests. The approach leverages SPDK-based block devices on the DPU, implementing a RAID vbdev with a safe read mechanism and a DOCA-backed compression block device to demonstrate architectural flexibility. Experimental results show SNAP-based offload achieving up to $14 GB/s$ throughput with $16 μs$ read latency, significantly outperforming NVMe-oF RDMA in multi-threaded scenarios and reducing host CPU involvement. The work highlights the practicality of hardware-assisted offload for datacenter storage, reducing host load and VM exit costs while enabling adaptable storage backends; future directions include multi-tenant SR-IOV scalability and broader SPDK customization.

Abstract

Efficient virtualization of CPU and memory is standardized and mature. Capabilities such as Intel VT-x [3] have been added by manufacturers for efficient hypervisor support. In contrast, virtualization of a block device and its presentation to the virtual machines on the host can be done in multiple ways. Indeed, hyperscalers develop in-house solutions to improve performance and cost-efficiency of their storage solutions for datacenters. Unfortunately, these storage solutions are based on specialized hardware and software which are not publicly available. The traditional solution is to expose virtual block device to the VM through a paravirtualized driver like virtio [2]. virtio provides significantly better performance than real block device driver emulation because of host OS and guest OS cooperation. The IO requests are then fulfilled by the host OS either with a local block device such as an SSD drive or with some form of disaggregated storage over the network like NVMe-oF or iSCSI. There are three main problems to the traditional solution. 1) Cost. IO operations consume host CPU cycles due to host OS involvement. These CPU cycles are doing useless work from the application point of view. 2) Inflexibility. Any change of the virtualized storage stack requires host OS and/or guest OS cooperation and cannot be done silently in production. 3) Performance. IO operations are causing recurring VM EXITs to do the transition from non-root mode to root mode on the host CPU. This results into excessive IO performance impact. We propose FlexBSO, a hardware-assisted solution, which solves all the mentioned issues. Our prototype is based on the publicly available Bluefield-2 SmartNIC with NVIDIA SNAP support, hence can be deployed without any obstacles.

FlexBSO: Flexible Block Storage Offload for Datacenters

TL;DR

FlexBSO tackles the overhead, rigidity, and VM exit penalties of traditional block-device virtualization by offloading the storage stack to a Bluefield-2 SmartNIC using NVIDIA SNAP and SR-IOV to present NVMe devices directly to guests. The approach leverages SPDK-based block devices on the DPU, implementing a RAID vbdev with a safe read mechanism and a DOCA-backed compression block device to demonstrate architectural flexibility. Experimental results show SNAP-based offload achieving up to throughput with read latency, significantly outperforming NVMe-oF RDMA in multi-threaded scenarios and reducing host CPU involvement. The work highlights the practicality of hardware-assisted offload for datacenter storage, reducing host load and VM exit costs while enabling adaptable storage backends; future directions include multi-tenant SR-IOV scalability and broader SPDK customization.

Abstract

Efficient virtualization of CPU and memory is standardized and mature. Capabilities such as Intel VT-x [3] have been added by manufacturers for efficient hypervisor support. In contrast, virtualization of a block device and its presentation to the virtual machines on the host can be done in multiple ways. Indeed, hyperscalers develop in-house solutions to improve performance and cost-efficiency of their storage solutions for datacenters. Unfortunately, these storage solutions are based on specialized hardware and software which are not publicly available. The traditional solution is to expose virtual block device to the VM through a paravirtualized driver like virtio [2]. virtio provides significantly better performance than real block device driver emulation because of host OS and guest OS cooperation. The IO requests are then fulfilled by the host OS either with a local block device such as an SSD drive or with some form of disaggregated storage over the network like NVMe-oF or iSCSI. There are three main problems to the traditional solution. 1) Cost. IO operations consume host CPU cycles due to host OS involvement. These CPU cycles are doing useless work from the application point of view. 2) Inflexibility. Any change of the virtualized storage stack requires host OS and/or guest OS cooperation and cannot be done silently in production. 3) Performance. IO operations are causing recurring VM EXITs to do the transition from non-root mode to root mode on the host CPU. This results into excessive IO performance impact. We propose FlexBSO, a hardware-assisted solution, which solves all the mentioned issues. Our prototype is based on the publicly available Bluefield-2 SmartNIC with NVIDIA SNAP support, hence can be deployed without any obstacles.
Paper Structure (10 sections, 6 figures)

This paper contains 10 sections, 6 figures.

Figures (6)

  • Figure 1: SR-IOV allows using FlexBSO from multiple VMs without host OS intervention.
  • Figure 2: System diagram
  • Figure 3: RAID5 "safe read"
  • Figure 4: Throughput of exposed NVMe device via SNAP and NVMe over RDMA. The microbenchmark was performed by FIO with 1MB block size, 32 IO depth, direct IO, runtime of 60 seconds and 4 and 1 threads eventually. SPDK backend was configured as RAID1 device backed by Null block device. SNAP is capable of reaching up to 14GB/s in throughput, which makes SNAP a suitable solution, since many VMs can share this bandwidth without causing a bottleneck for common use cases.
  • Figure 5: Time required to perform compression and decompression individually using zlib library and hardware acceleration
  • ...and 1 more figures