Table of Contents
Fetching ...

Microsecond-scale Dynamic Validation of Idempotency for GPU Kernels

Mingcong Han, Weihang Shen, Guanwen Peng, Rong Chen, Haibo Chen

TL;DR

PICKER dynamically validates the idempotency of GPU kernel instances before their execution, by utilizing their launch arguments, and is presented, the first system for instance-level idempotency validation.

Abstract

We discovered that a GPU kernel can have both idempotent and non-idempotent instances depending on the input. These kernels, called conditionally-idempotent, are prevalent in real-world GPU applications (490 out of 547 from six applications). Consequently, prior work that classifies GPU kernels as either idempotent or non-idempotent can severely compromise the correctness or efficiency of idempotence-based systems. This paper presents PICKER, the first system for instance-level idempotency validation. PICKER dynamically validates the idempotency of GPU kernel instances before their execution, by utilizing their launch arguments. Several optimizations are proposed to significantly reduce validation latency to microsecond-scale. Evaluations using representative GPU applications (547 kernels and 18,217 instances in total) show that PICKER can identify idempotent instances with no false positives and a false-negative rate of 18.54%, and can complete the validation within 5 us for all instances. Furthermore, by integrating PICKER, a fault-tolerant system can reduce the checkpoint cost to less than 4% and a scheduling system can reduce the preemption latency by 84.2%.

Microsecond-scale Dynamic Validation of Idempotency for GPU Kernels

TL;DR

PICKER dynamically validates the idempotency of GPU kernel instances before their execution, by utilizing their launch arguments, and is presented, the first system for instance-level idempotency validation.

Abstract

We discovered that a GPU kernel can have both idempotent and non-idempotent instances depending on the input. These kernels, called conditionally-idempotent, are prevalent in real-world GPU applications (490 out of 547 from six applications). Consequently, prior work that classifies GPU kernels as either idempotent or non-idempotent can severely compromise the correctness or efficiency of idempotence-based systems. This paper presents PICKER, the first system for instance-level idempotency validation. PICKER dynamically validates the idempotency of GPU kernel instances before their execution, by utilizing their launch arguments. Several optimizations are proposed to significantly reduce validation latency to microsecond-scale. Evaluations using representative GPU applications (547 kernels and 18,217 instances in total) show that PICKER can identify idempotent instances with no false positives and a false-negative rate of 18.54%, and can complete the validation within 5 us for all instances. Furthermore, by integrating PICKER, a fault-tolerant system can reduce the checkpoint cost to less than 4% and a scheduling system can reduce the preemption latency by 84.2%.

Paper Structure

This paper contains 23 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Simplified version of an idempotent GPU kernel, a non-idempotent GPU kernel, and a conditionally-idempotent GPU kernel with its two instances. bid, tid, gdim and bdim stand for the index and the dimension of a block and a thread, respectively.
  • Figure 2: Architecture of Picker and the workflow of using Picker to correctly and efficiently run GPU kernels on idemp-based systems.
  • Figure 3: Pseudocode of dynamic idempotency validation.
  • Figure 4: An example for idempotency validation using our strawman solution. elemwise_relu is a simplified version of a cond-idempotent GPU kernel mentioned in $\S$\ref{['subsec:bg-study']} from PyTorch PyTorch.
  • Figure 5: Examples of false-negative cases.
  • ...and 3 more figures