Data Layout Polymorphism for Bounding Volume Hierarchies
Christophe Gyurgyik, Alexander J Root, Fredrik Kjolstad
TL;DR
This paper introduces Scion, a system that decouples the data layout of bounding volume hierarchies from tree traversal algorithms through two domain-specific languages: a layout language for physical memory organization and a build language for transforming logical trees into physical representations. By enabling destructor and constructor specialization, Scion offers portable, architecture-agnostic optimization of BVH layouts and supports systematic exploration of a rich layout design space. Empirical results show that the Pareto frontier of layouts depends on the traversal algorithm, hardware, and data characteristics, and a novel layout (pbrt-q16) achieves Pareto-optimality across a majority of contexts. The findings demonstrate that layout-conscious design can yield substantial performance gains while preserving traversal portability, with generated code competitive with state-of-the-art kernels such as Embree, FCPW, and FCL.
Abstract
Bounding volume hierarchies are ubiquitous acceleration structures in graphics, scientific computing, and data analytics. Their performance depends critically on data layout choices that affect cache utilization, memory bandwidth, and vectorization -- increasingly dominant factors in modern computing. Yet, in most programming systems, these layout choices are hopelessly entangled with the traversal logic. This entanglement prevents developers from independently optimizing data layouts and algorithms across different contexts, perpetuating a false dichotomy between performance and portability. We introduce Scion, a domain-specific language and compiler for specifying the data layouts of bounding volume hierarchies independent of tree traversal algorithms. We show that Scion can express a broad spectrum of layout optimizations used in high performance computing while remaining architecture-agnostic. We demonstrate empirically that Pareto-optimal layouts (along performance and memory footprint axes) vary across algorithms, architectures, and workload characteristics. Through systematic design exploration, we also identify a novel ray tracing layout that combines optimization techniques from prior work, achieving Pareto-optimality across diverse architectures and scenes.
