Table of Contents
Fetching ...

Efficient Timestamping for Sampling-based Race Detection

Minjian Zhang, Daniel Wee Soong Lim, Mosaad Al Thokair, Umang Mathur, Mahesh Viswanathan

TL;DR

This work addresses the overhead of dynamic HB-race detection in sampling-based settings by introducing a freshness timestamp and compact data structures to reduce vector-clock work. It formulates and solves the Analysis Problem, showing that the analysis can be performed in time nearly proportional to the sample size with per-event costs tied to the number of threads, and introduces a nearly optimal algorithm that uses freshness timestamps and shallow copies via ordered lists. The authors prove bounds on vector-clock traversals and demonstrate instance-optimal behavior in practice, implementing and evaluating the approach in ThreadSanitizer and the RAPID offline framework. Experiments on real-world benchmarks (e.g., MySQL workloads) show substantial reductions in algorithmic overhead at low sampling rates without sacrificing a large fraction of race-detection capability, indicating practical impact for production-friendly sampling-based race detectors.

Abstract

Dynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal -- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.

Efficient Timestamping for Sampling-based Race Detection

TL;DR

This work addresses the overhead of dynamic HB-race detection in sampling-based settings by introducing a freshness timestamp and compact data structures to reduce vector-clock work. It formulates and solves the Analysis Problem, showing that the analysis can be performed in time nearly proportional to the sample size with per-event costs tied to the number of threads, and introduces a nearly optimal algorithm that uses freshness timestamps and shallow copies via ordered lists. The authors prove bounds on vector-clock traversals and demonstrate instance-optimal behavior in practice, implementing and evaluating the approach in ThreadSanitizer and the RAPID offline framework. Experiments on real-world benchmarks (e.g., MySQL workloads) show substantial reductions in algorithmic overhead at low sampling rates without sacrificing a large fraction of race-detection capability, indicating practical impact for production-friendly sampling-based race detectors.

Abstract

Dynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal -- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.

Paper Structure

This paper contains 26 sections, 9 theorems, 14 equations, 9 figures, 4 algorithms.

Key Result

Proposition 1

For an execution $\sigma$ events $e_1, e_2 \in \mathsf{Events}_{\sigma}$ with $\mathsf{thr}(e_1) \neq \mathsf{thr}(e_2)$, we have:

Figures (9)

  • Figure 1: Example execution of a two threaded program shown on the left. Marked events that form the set $S$ are shown shaded in light blue. Arrows indicate that information is communicated from a release to the next acquire. Vector clocks maintained by Djit+( FastTrack) are shown in the table in the middle. Columns 1 and 2 of the table show the clocks of threads $t_1$ and $t_2$, respectively. Column 3 of the table shows the clock of locks $\ell_i$; to save space they are combined into one column. An entry ($\ell$, $\langle$a,b$\rangle$) in this column means that $\mathbb{C}_{\ell}$ has value $\langle$a,b$\rangle$ at that step. The table on the right shows the values of vector clocks maintained by Algorithm \ref{['app:algo2']}. Column 1 now records the local time of $t_1$, column 2 the vector clock of $t_1$, column 3 the local time of $t_2$, column 4 the vector clock of $t_2$, and column 5 shows the clock of lock $\ell_i$.
  • Figure 2: The same execution as Fig. \ref{['fig:moti2']} The table on the right shows the values of vector clocks maintained by Algorithm \ref{['app:smp-upd']}. Acquires which can be skipped are shown shaded in light blue.
  • Figure 3: The figure on the left shows a pair of release and acquire of the same lock done by two threads in an execution of a program with 6 threads. The right table shows the the vector clocks Algorithm \ref{['app:smp-upd']} maintains for the two threads.
  • Figure 4: Example ordered list $O$ (left). Result of operation $O.\text{set}(t_4,6)$ (middle), and followed by $O.\text{inc}(t_1,1)$ (right).
  • Figure 5: Latency Relative to NT and Algorithmic Overhead Improvement
  • ...and 4 more figures

Theorems & Definitions (11)

  • Proposition 1
  • Lemma 2
  • Definition 1: Sampling Partial Order
  • Proposition 3
  • Lemma 4
  • Proposition 5
  • Proposition 6
  • Lemma 7
  • Example 1
  • Lemma 8
  • ...and 1 more