LAPIS: A Performance Portable, High Productivity Compiler Framework
Brian Kelley, Sivasankaran Rajamanickam
TL;DR
LAPIS presents an MLIR-based, extensible compiler framework that unifies performance portability, high productivity, and extensibility by introducing a Kokkos-inspired dialect and a dedicated emitter to generate portable C++ code. By enabling automatic lowering of sparse and dense linear algebra and bridging CSE with AI/ML workflows (e.g., PyTorch models) into Kokkos, LAPIS delivers portable performance across CPU/GPU architectures including AMD MI300A, NVIDIA H100, and Intel Granite Rapids. Key contributions include a Kokkos Dialect, lowering passes for hierarchical parallelism, a DualView-inspired memory model, and a Kokkos emitter that outputs standalone C++ ready for integration into existing simulations. Empirical results show LAPIS achieving near-hand-optimized performance for SpMV and GEMM, alongside successful end-to-end integrations like MALA and ResNet18, demonstrating practical impact for SciML and AI/ML co-design with portable backends. The work signals a path toward broader front-end support and kernel coverage, enabling seamless cross-domain development on heterogeneous architectures.
Abstract
Portability, performance, and productivity are three critical dimensions for evaluating a programming model or compiler infrastructure. Several modern programming models for computational science focus on performance and portability. On the other end, several machine learning focused programming models focus on portability and productivity. A clear solution that is strong in all three dimensions has yet to emerge. A second related problem arises when use cases from computational science converge with machine learning. The disparate popular frameworks of these fields require programmers to manually integrate codes written in different frameworks. Finally, several programming frameworks lack easy options for extensibility as any new computer architecture change require complex changes to the programming models. We present LAPIS, an MLIR-based compiler that addresses all three of these challenges. We demonstrate that LAPIS can automatically lower sparse and dense linear algebra kernels from computational science and artificial intelligence use cases. We also show how LAPIS facilitates the integration of codes between PyTorch and Kokkos. We compare kernel performance with the default MLIR implementations on diverse architectures to demonstrate portability. By developing a dialect that is built on the principles of the Kokkos ecosystem, LAPIS also allows extensibility of the framework to new architectures.
