Table of Contents
Fetching ...

Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure

Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, Carson K. Leung

TL;DR

This work tackles the challenge of mining sequential patterns from uncertain databases with weights by introducing tightened upper bounds $expSup^{cap}$, $wgt^{cap}$, and $wExpSup^{cap}$ and a hierarchy-based index, the USeq-Trie. It presents a PrefixSpan-like algorithm, FUSP, that uses these bounds with the SupCalc support calculator to prune search space and accelerate mining, plus an incremental variant InUSP that uses Promising Frequent Sequences to maintain a near-complete set of patterns across data growth. The authors demonstrate substantial gains in pruning efficiency, runtime (often orders of magnitude faster than prior methods), and incremental completeness across multiple real-world datasets. Overall, the framework enables scalable, weighted, and incremental uncertain sequential pattern mining suitable for dynamic domains such as healthcare and sensor networks.

Abstract

In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications.

Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure

TL;DR

This work tackles the challenge of mining sequential patterns from uncertain databases with weights by introducing tightened upper bounds , , and and a hierarchy-based index, the USeq-Trie. It presents a PrefixSpan-like algorithm, FUSP, that uses these bounds with the SupCalc support calculator to prune search space and accelerate mining, plus an incremental variant InUSP that uses Promising Frequent Sequences to maintain a near-complete set of patterns across data growth. The authors demonstrate substantial gains in pruning efficiency, runtime (often orders of magnitude faster than prior methods), and incremental completeness across multiple real-world datasets. Overall, the framework enables scalable, weighted, and incremental uncertain sequential pattern mining suitable for dynamic domains such as healthcare and sensor networks.

Abstract

In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications.
Paper Structure (8 sections, 4 theorems, 4 equations, 6 figures, 3 tables)

This paper contains 8 sections, 4 theorems, 4 equations, 6 figures, 3 tables.

Key Result

lemma thmcounterlemma

For a sequence $\alpha$, $expSup^{cap}(\alpha)\geq expSup(\alpha)$ and $expSup(\alpha)\geq expSup(\alpha^{'})$,where $\alpha \subseteq \alpha^{'}$; $\therefore$$expSup^{cap}(\alpha)\geq expSup(\alpha^{'})$. If $expSup^{cap}(\alpha)$$<$ a minimum threshold $\gamma$ holds, then $expSup(\alpha)<\gamma$

Figures (6)

  • Figure 1: An efficient way to compute WES of patterns stored into USeq-Trie
  • Figure 2: FUSP outperforms uWSequence in candidate generation
  • Figure 3: Completeness comparison between $WIncSpan'$ and InUSP
  • Figure 4: Runtime comparison between $WIncSpan'$ and proposed InUSP
  • Figure 5: Comparison of scalability using Kosarak dataset
  • ...and 1 more figures

Theorems & Definitions (4)

  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • lemma thmcounterlemma
  • lemma thmcounterlemma