Table of Contents
Fetching ...

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

Tao Duan, Junzhou Zhao, Shuo Zhang, Jing Tao, Pinghui Wang

TL;DR

This work tackles the problem of early and accurate classification for tangled key-value sequences, where items with different keys intermingle within a single sequence. It introduces KVEC, a two-module framework comprising Key-Value Sequence Representation Learning (KVRL) and Early Co-classification Timing Learning (ECTL), which jointly learn semantically enriched representations and an adaptive halting policy. By exploiting both key and value correlations through a correlation-aware attention mechanism and a gating-based embedding fusion, KVEC achieves superior accuracy and faster decisions, validated on multiple real-world and synthetic datasets with improvements up to $4.7$–$17.5\%$ in accuracy under the same earliness and HM gains of $3.7$–$14.0\%$. The empirical results demonstrate KVEC’s effectiveness in practical scenarios such as e-commerce profiling and network traffic classification, highlighting the importance of modeling intra- and inter-sequence correlations for timely predictions.

Abstract

Key-value sequence data has become ubiquitous and naturally appears in a variety of real-world applications, ranging from the user-product purchasing sequences in e-commerce, to network packet sequences forwarded by routers in networking. Classifying these key-value sequences is important in many scenarios such as user profiling and malicious applications identification. In many time-sensitive scenarios, besides the requirement of classifying a key-value sequence accurately, it is also desired to classify a key-value sequence early, in order to respond fast. However, these two goals are conflicting in nature, and it is challenging to achieve them simultaneously. In this work, we formulate a novel tangled key-value sequence early classification problem, where a tangled key-value sequence is a mixture of several concurrent key-value sequences with different keys. The goal is to classify each individual key-value sequence sharing a same key both accurately and early. To address this problem, we propose a novel method, i.e., Key-Value sequence Early Co-classification (KVEC), which leverages both inner- and inter-correlations of items in a tangled key-value sequence through key correlation and value correlation to learn a better sequence representation. Meanwhile, a time-aware halting policy decides when to stop the ongoing key-value sequence and classify it based on current sequence representation. Experiments on both real-world and synthetic datasets demonstrate that our method outperforms the state-of-the-art baselines significantly. KVEC improves the prediction accuracy by up to $4.7 - 17.5\%$ under the same prediction earliness condition, and improves the harmonic mean of accuracy and earliness by up to $3.7 - 14.0\%$.

Representation Learning of Tangled Key-Value Sequence Data for Early Classification

TL;DR

This work tackles the problem of early and accurate classification for tangled key-value sequences, where items with different keys intermingle within a single sequence. It introduces KVEC, a two-module framework comprising Key-Value Sequence Representation Learning (KVRL) and Early Co-classification Timing Learning (ECTL), which jointly learn semantically enriched representations and an adaptive halting policy. By exploiting both key and value correlations through a correlation-aware attention mechanism and a gating-based embedding fusion, KVEC achieves superior accuracy and faster decisions, validated on multiple real-world and synthetic datasets with improvements up to in accuracy under the same earliness and HM gains of . The empirical results demonstrate KVEC’s effectiveness in practical scenarios such as e-commerce profiling and network traffic classification, highlighting the importance of modeling intra- and inter-sequence correlations for timely predictions.

Abstract

Key-value sequence data has become ubiquitous and naturally appears in a variety of real-world applications, ranging from the user-product purchasing sequences in e-commerce, to network packet sequences forwarded by routers in networking. Classifying these key-value sequences is important in many scenarios such as user profiling and malicious applications identification. In many time-sensitive scenarios, besides the requirement of classifying a key-value sequence accurately, it is also desired to classify a key-value sequence early, in order to respond fast. However, these two goals are conflicting in nature, and it is challenging to achieve them simultaneously. In this work, we formulate a novel tangled key-value sequence early classification problem, where a tangled key-value sequence is a mixture of several concurrent key-value sequences with different keys. The goal is to classify each individual key-value sequence sharing a same key both accurately and early. To address this problem, we propose a novel method, i.e., Key-Value sequence Early Co-classification (KVEC), which leverages both inner- and inter-correlations of items in a tangled key-value sequence through key correlation and value correlation to learn a better sequence representation. Meanwhile, a time-aware halting policy decides when to stop the ongoing key-value sequence and classify it based on current sequence representation. Experiments on both real-world and synthetic datasets demonstrate that our method outperforms the state-of-the-art baselines significantly. KVEC improves the prediction accuracy by up to under the same prediction earliness condition, and improves the harmonic mean of accuracy and earliness by up to .
Paper Structure (20 sections, 18 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 18 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: Tangled key-value sequence early classification. Items of different shapes represent different network packets. Packets of the same color mean that they have the same five-tuple and belong to the same network flow. We want to classify each network flow both accurately and early.
  • Figure 2: Overview of the KVEC framework.
  • Figure 3: Accuracy comparison
  • Figure 4: Precision comparison
  • Figure 5: Recall comparison
  • ...and 7 more figures