SIMT/GPU Data Race Verification using ISCC and Intermediary Code Representations: A Case Study
Andrew Osterhout, Ganesh Gopalakrishnan
TL;DR
The paper tackles the problem of ensuring correct memory ordering in SIMT/GPU code across languages by performing memory-model verification on lower-level intermediary representations (LLVM IR, CUDA PTX, and related forms). It advances a methodology based on ISCC to construct and test memory-dependency constraints (RaW, WaW, WaR) and applies it to a Ge-SpMM case study implemented in CUDA C++ and Julia CUDA.jl, aiming to verify data-race freedom. The results demonstrate the feasibility of generating ISCC representations from intermediary representations and performing the dependency checks, with Julia-generated IR presenting practical challenges due to IR boilerplate. This work highlights the potential of cross-language, intermediary-representation memory-race verification for GPU kernels and points to the need for fuller automation and broader validation to generalize the approach.
Abstract
It is often difficult to write code that you can ensure will be executed in the right order when programing for parallel compute tasks. Due to the way that today's parallel compute hardware, primarily Graphical Processing Units (GPUs), allows you to write code. It is easy to write code that may result in one thread reading or modifying data before it should, thus resulting in a data race. It would be useful to have a tool that could verify that the code will execute as expected. However, most static analysis done at the language level has to be completely retooled to work on a different languages. Therefore, it would be of great use to be able to perform verification and analysis on the Memory Model of a parallel compute code, in a lower level intermediary representations that most languages pass through on their way to something that the GPU hardware can understand. This body of work aims to deal with the question of if there is still enough of the information in the intermediary representations to be able to perform memory model verification to check for data races. To determine this we plan to analyze as a case study the GeSpMM Sparse Matrix Multiplication Algorithm, implemented in CUDA C++ with the LLVM compiler and Julia with CUDA.jl.
