Representation Learning of Tangled Key-Value Sequence Data for Early Classification

Tao Duan; Junzhou Zhao; Shuo Zhang; Jing Tao; Pinghui Wang

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

Tao Duan, Junzhou Zhao, Shuo Zhang, Jing Tao, Pinghui Wang

TL;DR

This work tackles the problem of early and accurate classification for tangled key-value sequences, where items with different keys intermingle within a single sequence. It introduces KVEC, a two-module framework comprising Key-Value Sequence Representation Learning (KVRL) and Early Co-classification Timing Learning (ECTL), which jointly learn semantically enriched representations and an adaptive halting policy. By exploiting both key and value correlations through a correlation-aware attention mechanism and a gating-based embedding fusion, KVEC achieves superior accuracy and faster decisions, validated on multiple real-world and synthetic datasets with improvements up to $4.7$–$17.5\%$ in accuracy under the same earliness and HM gains of $3.7$–$14.0\%$. The empirical results demonstrate KVEC’s effectiveness in practical scenarios such as e-commerce profiling and network traffic classification, highlighting the importance of modeling intra- and inter-sequence correlations for timely predictions.

Abstract

Key-value sequence data has become ubiquitous and naturally appears in a variety of real-world applications, ranging from the user-product purchasing sequences in e-commerce, to network packet sequences forwarded by routers in networking. Classifying these key-value sequences is important in many scenarios such as user profiling and malicious applications identification. In many time-sensitive scenarios, besides the requirement of classifying a key-value sequence accurately, it is also desired to classify a key-value sequence early, in order to respond fast. However, these two goals are conflicting in nature, and it is challenging to achieve them simultaneously. In this work, we formulate a novel tangled key-value sequence early classification problem, where a tangled key-value sequence is a mixture of several concurrent key-value sequences with different keys. The goal is to classify each individual key-value sequence sharing a same key both accurately and early. To address this problem, we propose a novel method, i.e., Key-Value sequence Early Co-classification (KVEC), which leverages both inner- and inter-correlations of items in a tangled key-value sequence through key correlation and value correlation to learn a better sequence representation. Meanwhile, a time-aware halting policy decides when to stop the ongoing key-value sequence and classify it based on current sequence representation. Experiments on both real-world and synthetic datasets demonstrate that our method outperforms the state-of-the-art baselines significantly. KVEC improves the prediction accuracy by up to $4.7 - 17.5\%$ under the same prediction earliness condition, and improves the harmonic mean of accuracy and earliness by up to $3.7 - 14.0\%$.

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

TL;DR

–

in accuracy under the same earliness and HM gains of

–

. The empirical results demonstrate KVEC’s effectiveness in practical scenarios such as e-commerce profiling and network traffic classification, highlighting the importance of modeling intra- and inter-sequence correlations for timely predictions.

Abstract

under the same prediction earliness condition, and improves the harmonic mean of accuracy and earliness by up to

Paper Structure (20 sections, 18 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 18 equations, 12 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Notations and Problem Formulation
Key-Value Sequence Early Co-Classification
Overview of KVEC
Key-Value Sequence Representation Learning (KVRL)
Early Co-classification Timing Learning (ECTL)
Classification Network
Model Training
Experiments
Experimental Setup
Datasets
Baseline Methods
Performance Metrics
Settings
...and 5 more sections

Figures (12)

Figure 1: Tangled key-value sequence early classification. Items of different shapes represent different network packets. Packets of the same color mean that they have the same five-tuple and belong to the same network flow. We want to classify each network flow both accurately and early.
Figure 2: Overview of the KVEC framework.
Figure 3: Accuracy comparison
Figure 4: Precision comparison
Figure 5: Recall comparison
...and 7 more figures

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

TL;DR

Abstract

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (12)