VQhull: a Fast Planar Quickhull
Thomas Koopman, Jordy Aaldering, Bernard van Gastel, Sven-Bodo Scholz
TL;DR
VQhull presents a vectorized, parallel implementation of Quickhull for planar convex hulls that minimizes data movement and exploits CPU bandwidth. By introducing a vectorized in-place subset extraction and a two-phase parallelization (parallel step plus cleanup), it achieves up to 1.6–16× sequential and 1.5–11× parallel speedups over the state of the art, while approaching hardware bandwidth limits on non-NUMA systems and showing meaningful energy efficiency trends. The work includes extensive benchmarking across three platforms and three PBBS datasets, with a nuanced discussion of branch behavior, vectorization, memory subsystem effects, and energy consumption. The findings highlight that performance and energy can decouple in bandwidth-bound geometric algorithms and point to future directions like memory-bandwidth–reduction heuristics and NUMA-aware parallel strategies to push further gains.
Abstract
Finding the convex hull is a fundamental problem in computational geometry. Quickhull is a fast algorithm for finding convex hulls. In this paper, we present VQhull, a fast parallel implementation of Quickhull that exploits vector instructions, and coordinates CPU cores in a way that minimizes data movement. This implementation obtains a sequential runtime improvement of 1.6--16x, and a parallel runtime improvement of 1.5-11x compared to the state of the art on the Problem Based Benchmark Suite. VQhull achieves 85--100% of non-NUMA architectures' peak bandwidth, and 66--78% on our two-CPU NUMA system. This leaves little room for further improvements. A 4x speedup on 8 cores has a parallel efficiency of 50%. This suggests a waste of energy, but our measurements show a more complicated picture: energy usage may even be lower in parallel. Quickhull serves as a case study that runtime and energy consumption do not go hand in hand.
