Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments
Hao Zheng, Longxiang Wang, Yun Xu, Qiang Wang, Yibin Shen, Xiaoshe Dong, Bang Di, Jia Wei, Shenyu Dong, Xingjun Zhang, Weichen Chen, Zhao Han, Sanqian Zhao, Dongdong Huang, Jie Qi, Yifan Yang, Zhao Gao, Yi Wang, Jinhu Li, Xudong Ren, Min He, Hang Yang, Xiao Zheng, Haijiao Hao, Jiesheng Wu
TL;DR
Taiji tackles the challenge of memory elasticity for in-production DPUs by introducing a lightweight, hybrid virtualization architecture that enables full memory swappability without the overhead of traditional VM-based approaches. It combines a minimal virtualization layer with a parallel memory-elasticity engine, a multi-level LRU for hot/cold page tracking, and a high-throughput, low-latency swap path to achieve memory expansion of over $>50\%$ while keeping virtualization overhead around $5\%$ and swap-in latency within the $10\mu s$ target for $90\%$ of cases. A unified resource scheduler ensures elasticity tasks run with minimal interference to high-performance I/O, and a hot-switch/hot-upgrade design enables seamless large-scale deployment across existing and new DPUs. Evaluation in production-scale settings shows negligible performance degradation, low code and metadata overhead, and strong elasticity benefits, with deployment on over $30{,}000$ DPUs and promising portability to other architectures. The work demonstrates a practical path to raising DPU resource utilization and extending hardware lifecycles in cloud data centers, enabling more aggressive software-defined scaling without frequent hardware upgrades.
Abstract
The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long hardware upgrade cycles and limited resources. To address these, we propose Taiji, a resource-elasticity architecture for DPUs. Combining hybrid virtualization with parallel memory swapping, Taiji switches the DPU's operating system (OS) into a guest OS and inserts a lightweight virtualization layer, making nearly all DPU memory swappable. It achieves memory overcommitment for the switched guest OS via high-performance memory elasticity, fully transparent to upper-layer applications, and supports hot-switch and hot-upgrade to meet in-production cloud requirements. Experiments show that Taiji expands DPU memory resources by over 50%, maintains virtualization overhead around 5%, and ensures 90% of swap-ins complete within 10 microseconds. Taiji delivers an efficient, reliable, low-overhead elasticity solution for DPUs and is deployed in large-scale production systems across more than 30,000 servers.
