FlexBSO: Flexible Block Storage Offload for Datacenters
Vojtech Aschenbrenner, John Shawger, Sadman Sakib
TL;DR
FlexBSO tackles the overhead, rigidity, and VM exit penalties of traditional block-device virtualization by offloading the storage stack to a Bluefield-2 SmartNIC using NVIDIA SNAP and SR-IOV to present NVMe devices directly to guests. The approach leverages SPDK-based block devices on the DPU, implementing a RAID vbdev with a safe read mechanism and a DOCA-backed compression block device to demonstrate architectural flexibility. Experimental results show SNAP-based offload achieving up to $14 GB/s$ throughput with $16 μs$ read latency, significantly outperforming NVMe-oF RDMA in multi-threaded scenarios and reducing host CPU involvement. The work highlights the practicality of hardware-assisted offload for datacenter storage, reducing host load and VM exit costs while enabling adaptable storage backends; future directions include multi-tenant SR-IOV scalability and broader SPDK customization.
Abstract
Efficient virtualization of CPU and memory is standardized and mature. Capabilities such as Intel VT-x [3] have been added by manufacturers for efficient hypervisor support. In contrast, virtualization of a block device and its presentation to the virtual machines on the host can be done in multiple ways. Indeed, hyperscalers develop in-house solutions to improve performance and cost-efficiency of their storage solutions for datacenters. Unfortunately, these storage solutions are based on specialized hardware and software which are not publicly available. The traditional solution is to expose virtual block device to the VM through a paravirtualized driver like virtio [2]. virtio provides significantly better performance than real block device driver emulation because of host OS and guest OS cooperation. The IO requests are then fulfilled by the host OS either with a local block device such as an SSD drive or with some form of disaggregated storage over the network like NVMe-oF or iSCSI. There are three main problems to the traditional solution. 1) Cost. IO operations consume host CPU cycles due to host OS involvement. These CPU cycles are doing useless work from the application point of view. 2) Inflexibility. Any change of the virtualized storage stack requires host OS and/or guest OS cooperation and cannot be done silently in production. 3) Performance. IO operations are causing recurring VM EXITs to do the transition from non-root mode to root mode on the host CPU. This results into excessive IO performance impact. We propose FlexBSO, a hardware-assisted solution, which solves all the mentioned issues. Our prototype is based on the publicly available Bluefield-2 SmartNIC with NVIDIA SNAP support, hence can be deployed without any obstacles.
