Table of Contents
Fetching ...

CUBIT: Concurrent Updatable Bitmap Indexing (Extended Version)

Junchang Wang, Manos Athanassoulis

TL;DR

This paper proposes Concurrent Updatable Bitmap indexing (CUBIT) that offers efficient real-time updates that scale with the number of CPU cores used and do not interfere with queries and proposes a lightweight snapshotting mechanism that allows queries to run on separate snapshots and provides a wait-free progress guarantee.

Abstract

Bitmap indexes are widely used for read-intensive analytical workloads because they are clustered and offer efficient reads with a small memory footprint. However, they are notoriously inefficient to update. As analytical applications are increasingly fused with transactional applications, leading to the emergence of hybrid transactional/analytical processing (HTAP), it is desirable that bitmap indexes support efficient concurrent real-time updates. In this paper, we propose Concurrent Updatable Bitmap indexing (CUBIT) that offers efficient real-time updates that scale with the number of CPU cores used and do not interfere with queries. Our design relies on three principles. First, we employ a horizontal bitwise representation of updated bits, which enables efficient atomic updates without locking entire bitvectors. Second, we propose a lightweight snapshotting mechanism that allows queries (including range queries) to run on separate snapshots and provides a wait-free progress guarantee. Third, we consolidate updates in a latch-free manner, providing a strong progress guarantee. Our evaluation shows that CUBIT offers 3x - 16x higher throughput and 3x - 220x lower latency than state-of-the-art updatable bitmap indexes. CUBIT's update-friendly nature widens the applicability of bitmap indexing. Experimenting with OLAP workloads with standard, batched updates shows that CUBIT overcomes the maintenance downtime and outperforms DuckDB by 1.2x - 2.7x on TPC-H. For HTAP workloads with real-time updates, CUBIT achieves 2x - 11x performance improvement over the state-of-the-art approaches.

CUBIT: Concurrent Updatable Bitmap Indexing (Extended Version)

TL;DR

This paper proposes Concurrent Updatable Bitmap indexing (CUBIT) that offers efficient real-time updates that scale with the number of CPU cores used and do not interfere with queries and proposes a lightweight snapshotting mechanism that allows queries to run on separate snapshots and provides a wait-free progress guarantee.

Abstract

Bitmap indexes are widely used for read-intensive analytical workloads because they are clustered and offer efficient reads with a small memory footprint. However, they are notoriously inefficient to update. As analytical applications are increasingly fused with transactional applications, leading to the emergence of hybrid transactional/analytical processing (HTAP), it is desirable that bitmap indexes support efficient concurrent real-time updates. In this paper, we propose Concurrent Updatable Bitmap indexing (CUBIT) that offers efficient real-time updates that scale with the number of CPU cores used and do not interfere with queries. Our design relies on three principles. First, we employ a horizontal bitwise representation of updated bits, which enables efficient atomic updates without locking entire bitvectors. Second, we propose a lightweight snapshotting mechanism that allows queries (including range queries) to run on separate snapshots and provides a wait-free progress guarantee. Third, we consolidate updates in a latch-free manner, providing a strong progress guarantee. Our evaluation shows that CUBIT offers 3x - 16x higher throughput and 3x - 220x lower latency than state-of-the-art updatable bitmap indexes. CUBIT's update-friendly nature widens the applicability of bitmap indexing. Experimenting with OLAP workloads with standard, batched updates shows that CUBIT overcomes the maintenance downtime and outperforms DuckDB by 1.2x - 2.7x on TPC-H. For HTAP workloads with real-time updates, CUBIT achieves 2x - 11x performance improvement over the state-of-the-art approaches.

Paper Structure

This paper contains 31 sections, 21 figures, 1 table.

Figures (21)

  • Figure 1: In the presence of both queries and updates, access path selection between tree-based indexing, bitmap indexing, and sequential scans depends on selectivity and update rate. Unlike prior bitmap indexes that were mainly used for read-only workloads, our solution, CUBIT, enables bitmap indexing for higher update rates: OLAP with batched updates (§\ref{['sec.eva.olap']}) and HTAP with real-time updates (§\ref{['sec.eva.htap']}).
  • Figure 2: (a) Existing updatable bitmap indexes do not scale on multicores, and (b) their synchronization mechanisms incur long tail latency.
  • Figure 3: (a) A classic bitmap index. (b) The state-of-the-art updatable bitmap index UpBit Athanassoulis2016a that associates a UB with each VB. (c) UpBit's UDIs update highly-compressible UBs.
  • Figure 4: Queries of the form "QTY < 3" by using either a B$^+$-tree or a bitmap index. The bitmap index is clustered, composition-friendly, and space-efficient.
  • Figure 5: Existing updatable bitmap indexes do not scale because (a) contention arises for high concurrency and (b) update costs increase with large datasets.
  • ...and 16 more figures