Efficient Mining of Low-Utility Sequential Patterns

Jian Zhu; Zhidong Lin; Wensheng Gan; Ruichu Cai; Zhifeng Hao; Philip S. Yu

Efficient Mining of Low-Utility Sequential Patterns

Jian Zhu, Zhidong Lin, Wensheng Gan, Ruichu Cai, Zhifeng Hao, Philip S. Yu

TL;DR

This work formalizes the LUSPM problem, redefine sequence utility, and introduces a compact data structure called the sequence-utility chain to efficiently record utility information and propose three novel algorithm--LUSPM_b, LUSPM_s, and LUSPM_e--to discover the complete set of low-utility sequential patterns.

Abstract

Discovering valuable insights from rich data is a crucial task for exploratory data analysis. Sequential pattern mining (SPM) has found widespread applications across various domains. In recent years, low-utility sequential pattern mining (LUSPM) has shown strong potential in applications such as intrusion detection and genomic sequence analysis. However, existing research in utility-based SPM focuses on high-utility sequential patterns, and the definitions and strategies used in high-utility SPM cannot be directly applied to LUSPM. Moreover, no algorithms have yet been developed specifically for mining low-utility sequential patterns. To address these problems, we formalize the LUSPM problem, redefine sequence utility, and introduce a compact data structure called the sequence-utility chain to efficiently record utility information. Furthermore, we propose three novel algorithm--LUSPM_b, LUSPM_s, and LUSPM_e--to discover the complete set of low-utility sequential patterns. LUSPM_b serves as an exhaustive baseline, while LUSPM_s and LUSPM_e build upon it, generating subsequences through shrinkage and extension operations, respectively. In addition, we introduce the maximal non-mutually contained sequence set and incorporate multiple pruning strategies, which significantly reduce redundant operations in both LUSPM_s and LUSPM_e. Finally, extensive experimental results demonstrate that both LUSPM_s and LUSPM_e substantially outperform LUSPM_b and exhibit excellent scalability. Notably, LUSPM_e achieves superior efficiency, requiring less runtime and memory consumption than LUSPM_s. Our code is available at https://github.com/Zhidong-Lin/LUSPM.

Efficient Mining of Low-Utility Sequential Patterns

TL;DR

Abstract

Efficient Mining of Low-Utility Sequential Patterns

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (28)