Improving a Parallel C++ Intel AVX-512 SIMD Linear Genetic Programming Interpreter
William B. Langdon
TL;DR
The paper demonstrates Magpie's ability to automatically uncover small, correct AVX-512 SIMD optimizations for GPengine's parallel interpreter, translating manual AVX work into efficient, XML-driven mutations. It documents a workflow that uses Intel Intrinsics Guide-wrapped intrinsics, XML-based edits, and a Linux mprotect sandbox to safely evaluate mutations. Results show substantial speedups over the SSE baseline, achieving up to ~3.9× faster performance and 3.5 Giga GP/s, while highlighting practical challenges in compilation, equivalence detection, and test coverage. The work emphasizes reproducible hardware-aware optimization and discusses limitations and directions for broader transplantation and AVX exploration.
Abstract
We extend recent 256 SSE vector work to 512 AVX giving a four fold speedup. We use MAGPIE (Machine Automated General Performance Improvement via Evolution of software) to speedup a C++ linear genetic programming interpreter. Local search is provided with three alternative hand optimised codes, revision history and the Intel 512 bit AVX512VL documentation as C++ XML. Magpie is applied to the new Single Instruction Multiple Data (SIMD) parallel interpreter for Peter Nordin's linear genetic programming GPengine. Linux mprotect sandboxes whilst performance is given by perf instruction count. In both cases, in a matter of hours local search reliably sped up 114 or 310 lines of manually written parallel SIMD code for the Intel Advanced Vector Extensions (AVX) by 2 percent.
