Matrix representation and GPU-optimized parallel B-spline computing
Jiayu Wu, Qiang Zou
TL;DR
This work addresses the bottlenecks of CPU-centric B-spline evaluation in CAD by introducing a matrix-based representation (M-rep) that reformulates B-spline operations as regular matrix additions and multiplications, enabling efficient GPU execution. By decomposing high-degree B-splines into cubic Béziers with controllable error and converting all subsequent operations into matrix form, the approach leverages tensor cores and warp-centric scheduling to accelerate projection and inversion by about two orders of magnitude over existing methods. Key innovations include an error-controlled decomposition, monotonic Bézier segmentation, Bézier clipping for projection/inversion, and a GPU-aware workload sharing strategy (PCWS) to maintain high parallel efficiency. The method demonstrates strong performance and robustness on large-scale 2D/3D B-spline tasks, while also outlining practical limitations (non-rational B-splines, ultra-high degrees) and future work to broaden applicability to more CAD/CAM tasks.
Abstract
B-spline modeling is fundamental to CAD systems, and its evaluation and manipulation algorithms currently in use were developed decades ago, specifically for CPU architectures. While remaining effective for many applications, these algorithms become increasingly inadequate as CAD models grow more complex, such as large-scale assemblies and microstructures. GPU acceleration offers a promising solution, but most existing GPU B-spline algorithms simply adapt CPU counterparts without accounting for the mismatch between the unstructured, recursive nature of B-splines and the structured nature of GPU kernels, ultimately failing to fully leverage GPU capabilities. This paper presents a novel approach that transforms B-spline representations into regular matrix structures, reducing all evaluation and manipulation computations to matrix addition and multiplication, thus better aligning with GPU architecture. By combining this matrix representation with GPU-optimized task scheduling and memory access patterns, the paper demonstrates significant performance improvements in the key B-spline operations of inversion and projection. Experimental results show an improvement of about two orders of magnitude in computational speed compared to existing methods.
