Table of Contents
Fetching ...

MatrixGate: A High-performance Data Ingestion Tool for Time-series Databases

Shuhui Wang, Zihan Sun, Chaochen Hu, Chao Li, Yong Zhang, Yandong Yao, Hao Wang, Chunxiao Xing

TL;DR

Results show that MatrixGate outperforms state-of-the-art methods by 3 to 100 times in loading speed, and cuts down about 80% query latency, and scales out efficiently under distributed architecture, achieving scalability of 86%.

Abstract

Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to low loading speed and poor timeliness. To address these challenges, this paper presents MatrixGate, a new and efficient data ingestion approach for massive time-series data. MatrixGate implements both single-instance and multi-instance parallel procedures, which is based on its unique ingestion strategies. First, MatrixGate uses policies to tune the slots that are synchronized with segments to ingest data, which eliminates the cost of starting transactions and enhance the efficiency. Second, multi-coroutines are responsible for transfer data, which can increase the degree of parallelism significantly. Third, lock-free queues are used to enable direct data transfer without the need for disk storage or lodging in the master instance. Experiment results on multiple datasets show that MatrixGate outperforms state-of-the-art methods by 3 to 100 times in loading speed, and cuts down about 80% query latency. Furthermore, MatrixGate scales out efficiently under distributed architecture, achieving scalability of 86%.

MatrixGate: A High-performance Data Ingestion Tool for Time-series Databases

TL;DR

Results show that MatrixGate outperforms state-of-the-art methods by 3 to 100 times in loading speed, and cuts down about 80% query latency, and scales out efficiently under distributed architecture, achieving scalability of 86%.

Abstract

Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to low loading speed and poor timeliness. To address these challenges, this paper presents MatrixGate, a new and efficient data ingestion approach for massive time-series data. MatrixGate implements both single-instance and multi-instance parallel procedures, which is based on its unique ingestion strategies. First, MatrixGate uses policies to tune the slots that are synchronized with segments to ingest data, which eliminates the cost of starting transactions and enhance the efficiency. Second, multi-coroutines are responsible for transfer data, which can increase the degree of parallelism significantly. Third, lock-free queues are used to enable direct data transfer without the need for disk storage or lodging in the master instance. Experiment results on multiple datasets show that MatrixGate outperforms state-of-the-art methods by 3 to 100 times in loading speed, and cuts down about 80% query latency. Furthermore, MatrixGate scales out efficiently under distributed architecture, achieving scalability of 86%.
Paper Structure (23 sections, 1 theorem, 10 equations, 13 figures, 4 tables)

This paper contains 23 sections, 1 theorem, 10 equations, 13 figures, 4 tables.

Key Result

Theorem 1

The optimal number of slots $N^*$ is given by:

Figures (13)

  • Figure 1: Different ingestion approaches.
  • Figure 2: The architecture of MatrixGate.
  • Figure 3: The timing sequence of naive data ingestion.
  • Figure 4: The timing sequence of MatrixGate's data ingestion.
  • Figure 5: An example of violation of policy (3) causing unnecessary slot.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Theorem 1