Table of Contents
Fetching ...

Distance based prefetching algorithms for mining of the sporadic requests associations

Vadim Voevodkin, Andrey Sokolov

TL;DR

The paper tackles reducing storage latency by improving prefetching of sporadic read requests. It introduces the Distance Based Sporadic Prefetcher (DBSP), a lightweight algorithm that uses distances between request histories and three tables to identify associations. The authors provide a rigorous evaluation methodology and demonstrate that DBSP outperforms the Mithril baseline with modest increases in cache hit ratio and acceptable storage overhead. The work offers practical integration guidance and a framework for consistent comparison of sporadic prefetchers across storage systems.

Abstract

Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR) directly affects overall performance characteristics of the storage system (read latency, IOPS, etc.). There are widely known prefetching algorithms that are focused on the discovery of the sequential patterns in the stream of requests. This study examines a family of prefetching algorithms that is focused on mining of the pseudo random (sporadic) patterns between read requests - sporadic prefetching algorithms. The key contribution of this paper is that it discovers a new, lightweight family of distance-based sporadic prefetching algorithms (DBSP) that outperforms the best previously known results on MSR traces collection.Another important contribution of this paper is a thorough description of the procedure for comparing the performance of sporadic prefetchers.

Distance based prefetching algorithms for mining of the sporadic requests associations

TL;DR

The paper tackles reducing storage latency by improving prefetching of sporadic read requests. It introduces the Distance Based Sporadic Prefetcher (DBSP), a lightweight algorithm that uses distances between request histories and three tables to identify associations. The authors provide a rigorous evaluation methodology and demonstrate that DBSP outperforms the Mithril baseline with modest increases in cache hit ratio and acceptable storage overhead. The work offers practical integration guidance and a framework for consistent comparison of sporadic prefetchers across storage systems.

Abstract

Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR) directly affects overall performance characteristics of the storage system (read latency, IOPS, etc.). There are widely known prefetching algorithms that are focused on the discovery of the sequential patterns in the stream of requests. This study examines a family of prefetching algorithms that is focused on mining of the pseudo random (sporadic) patterns between read requests - sporadic prefetching algorithms. The key contribution of this paper is that it discovers a new, lightweight family of distance-based sporadic prefetching algorithms (DBSP) that outperforms the best previously known results on MSR traces collection.Another important contribution of this paper is a thorough description of the procedure for comparing the performance of sporadic prefetchers.
Paper Structure (4 sections, 2 equations, 4 figures, 2 algorithms)

This paper contains 4 sections, 2 equations, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Scheme of work storage system controller
  • Figure 2: Comparison of prefetching algorithms with different values of prefetcher relative size. (a) Pareto front by Avg. Precision and Avg. Cache Hit Ratio with $s_c = 5$.(b) Pareto front by Avg. Precision and Avg. Cache Hit Ratio with $s_c = 7$.
  • Figure 3: Performance comparison of prefetchers algorithms. (a) Pareto front by Avg. Precision and Avg. Cache Hit Ratio.(b) Pareto front by Avg. Storage Activity Ratio and Avg. Cache Hit Ratio.
  • Figure 4: Performance comparison of prefetchers algorithms by traces from MSR dataset. (a) S-curve by Precision.(b) S-curve Cache Hit Ratio.