Warp-centric GPU meta-meshing and fast triangulation of billion-scale lattice structures
Qiang Zou, Yunzhu Gao
TL;DR
This work tackles the bottleneck of triangulating billion-scale lattice structures by introducing a lightweight meta-mesh representation that serves as a reusable base for multiresolution triangulation. It couples a warp-centric GPU meta-meshing algorithm with a two-level CPU–GPU asynchronous pipeline and a lossy data-compression scheme to hide data-transfer latency and reduce memory pressure. The approach decouples lattice geometry through auxiliary planes, uses a coalesced memory layout, and optimizes warp scheduling to dramatically increase parallelism, achieving minutes-scale triangulation on billion-scale models. Compared with traditional methods, the method yields substantially fewer triangles for the same accuracy and demonstrates scalable performance on real-world models, with acknowledged limitations and clear paths for future extension to broader lattice types and direct CAD workflows.
Abstract
Lattice structures have been widely used in applications due to their superior mechanical properties. To fabricate such structures, a geometric processing step called triangulation is often employed to transform them into the STL format before sending them to 3D printers. Because lattice structures tend to have high geometric complexity, this step usually generates a large amount of triangles, a memory and compute-intensive task. This problem manifests itself clearly through large-scale lattice structures that have millions or billions of struts. To address this problem, this paper proposes to transform a lattice structure into an intermediate model called meta-mesh before undergoing real triangulation. Compared to triangular meshes, meta-meshes are very lightweight and much less compute-demanding. The meta-mesh can also work as a base mesh reusable for conveniently and efficiently triangulating lattice structures with arbitrary resolutions. A CPU+GPU asynchronous meta-meshing pipeline has been developed to efficiently generate meta-meshes from lattice structures. It shifts from the thread-centric GPU algorithm design paradigm commonly used in CAD to the recent warp-centric design paradigm to achieve high performance. This is achieved by a new data compression method, a GPU cache-aware data structure, and a workload-balanced scheduling method that can significantly reduce memory divergence and branch divergence. Experimenting with various billion-scale lattice structures, the proposed method is seen to be two orders of magnitude faster than previously achievable.
