DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Lixi Zhou; K. Selçuk Candan; Jia Zou

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Lixi Zhou, K. Selçuk Candan, Jia Zou

TL;DR

This work argues and shows that a novel DeepMapping abstraction, which relies on the impressive memorization capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time.

Abstract

Storing tabular data to balance storage and query efficiency is a long-standing research question in the database community. In this work, we argue and show that a novel DeepMapping abstraction, which relies on the impressive memorization capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time. Such unique properties may benefit a broad class of use cases in capacity-limited devices. Our proposed DeepMapping abstraction transforms a dataset into multiple key-value mappings and constructs a multi-tasking neural network model that outputs the corresponding values for a given input key. To deal with memorization errors, DeepMapping couples the learned neural network with a lightweight auxiliary data structure capable of correcting mistakes. The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping. We propose a multi-task search strategy for selecting the hybrid DeepMapping structures (including model architecture and auxiliary structure) with a desirable trade-off among memorization capacity, size, and efficiency. Extensive experiments with a real-world dataset, synthetic and benchmark datasets, including TPC-H and TPC-DS, demonstrated that the DeepMapping approach can better balance the retrieving speed and compression ratio against several cutting-edge competitors.

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

TL;DR

Abstract

Paper Structure (29 sections, 1 equation, 10 figures, 5 tables, 5 algorithms)

This paper contains 29 sections, 1 equation, 10 figures, 5 tables, 5 algorithms.

Introduction
Related Works
Problem and Desiderata
DeepMapping Architecture
Shared Multi-Task Network
Ensuring $\mathbf{100\%}$ Accuracy (Desideratum $\mathbf{\#1}$)
The Auxiliary Structures ($T_{aux}$)
Lookup Process
Multi-Task Hybrid Architecture Search (Desiderata $\mathbf{\#2, 3}$)
MHAS Multi-Task Search Space
Multi-Task Model Search Controller
DeepMapping Modification Operations (Desideratum $\mathbf{\#4}$)
Extension to Range Queries
Evaluation
Experimental Environment Setup
...and 14 more sections

Figures (10)

Figure 1: DeepMapping relies on neural networks to memorize key-value mapping in tabular data.
Figure 2: Overview of the proposed neural network-based data compression methods.
Figure 3: (a) A high-level view of a candidate model and (b) a DAG (including all nodes and edges) represents the search space of one tree node in (a) -- here, the subgraph connected with the red edges illustrates a sampled network.
Figure 4: Trade-off between compression ratio and lookup performance in TPC-H (SF=10, B=100,000) in the small-size machine -- Annotations are explained in the footnote $^{\ref{['footnote:annotation']}}$.
Figure 5: Trade-off between compression ratio and lookup performance in TPC-DS (SF=10, B=100,000) in the small-size machine -- Annotations are explained in the footnote $^{\ref{['footnote:annotation']}}$.
...and 5 more figures

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

TL;DR

Abstract

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

Authors

TL;DR

Abstract

Table of Contents

Figures (10)