Table of Contents
Fetching ...

Matrix representation and GPU-optimized parallel B-spline computing

Jiayu Wu, Qiang Zou

TL;DR

This work addresses the bottlenecks of CPU-centric B-spline evaluation in CAD by introducing a matrix-based representation (M-rep) that reformulates B-spline operations as regular matrix additions and multiplications, enabling efficient GPU execution. By decomposing high-degree B-splines into cubic Béziers with controllable error and converting all subsequent operations into matrix form, the approach leverages tensor cores and warp-centric scheduling to accelerate projection and inversion by about two orders of magnitude over existing methods. Key innovations include an error-controlled decomposition, monotonic Bézier segmentation, Bézier clipping for projection/inversion, and a GPU-aware workload sharing strategy (PCWS) to maintain high parallel efficiency. The method demonstrates strong performance and robustness on large-scale 2D/3D B-spline tasks, while also outlining practical limitations (non-rational B-splines, ultra-high degrees) and future work to broaden applicability to more CAD/CAM tasks.

Abstract

B-spline modeling is fundamental to CAD systems, and its evaluation and manipulation algorithms currently in use were developed decades ago, specifically for CPU architectures. While remaining effective for many applications, these algorithms become increasingly inadequate as CAD models grow more complex, such as large-scale assemblies and microstructures. GPU acceleration offers a promising solution, but most existing GPU B-spline algorithms simply adapt CPU counterparts without accounting for the mismatch between the unstructured, recursive nature of B-splines and the structured nature of GPU kernels, ultimately failing to fully leverage GPU capabilities. This paper presents a novel approach that transforms B-spline representations into regular matrix structures, reducing all evaluation and manipulation computations to matrix addition and multiplication, thus better aligning with GPU architecture. By combining this matrix representation with GPU-optimized task scheduling and memory access patterns, the paper demonstrates significant performance improvements in the key B-spline operations of inversion and projection. Experimental results show an improvement of about two orders of magnitude in computational speed compared to existing methods.

Matrix representation and GPU-optimized parallel B-spline computing

TL;DR

This work addresses the bottlenecks of CPU-centric B-spline evaluation in CAD by introducing a matrix-based representation (M-rep) that reformulates B-spline operations as regular matrix additions and multiplications, enabling efficient GPU execution. By decomposing high-degree B-splines into cubic Béziers with controllable error and converting all subsequent operations into matrix form, the approach leverages tensor cores and warp-centric scheduling to accelerate projection and inversion by about two orders of magnitude over existing methods. Key innovations include an error-controlled decomposition, monotonic Bézier segmentation, Bézier clipping for projection/inversion, and a GPU-aware workload sharing strategy (PCWS) to maintain high parallel efficiency. The method demonstrates strong performance and robustness on large-scale 2D/3D B-spline tasks, while also outlining practical limitations (non-rational B-splines, ultra-high degrees) and future work to broaden applicability to more CAD/CAM tasks.

Abstract

B-spline modeling is fundamental to CAD systems, and its evaluation and manipulation algorithms currently in use were developed decades ago, specifically for CPU architectures. While remaining effective for many applications, these algorithms become increasingly inadequate as CAD models grow more complex, such as large-scale assemblies and microstructures. GPU acceleration offers a promising solution, but most existing GPU B-spline algorithms simply adapt CPU counterparts without accounting for the mismatch between the unstructured, recursive nature of B-splines and the structured nature of GPU kernels, ultimately failing to fully leverage GPU capabilities. This paper presents a novel approach that transforms B-spline representations into regular matrix structures, reducing all evaluation and manipulation computations to matrix addition and multiplication, thus better aligning with GPU architecture. By combining this matrix representation with GPU-optimized task scheduling and memory access patterns, the paper demonstrates significant performance improvements in the key B-spline operations of inversion and projection. Experimental results show an improvement of about two orders of magnitude in computational speed compared to existing methods.

Paper Structure

This paper contains 28 sections, 77 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: An overview of GPU architecture.
  • Figure 2: The pipeline of M-rep.
  • Figure 3: The workflow of vectorizing the polynomial coefficients of ${N_{q-3,3}}(t)$ in $A_{q,p,T}$ within the interval $t \in [t_q, t_{q+1}]$.
  • Figure 4: Pipeline of tensor core-accelerated B-spline decomposition.
  • Figure 5: The degree-reduced curve after $L_1$-error minimization (left) and $L_2$-error minimization (right).
  • ...and 16 more figures