Table of Contents
Fetching ...

Early-MFC: Enhanced Flow Correlation Attacks on Tor via Multi-view Triplet Networks with Early Network Traffic

Yali Yuan, Qianqi Niu, Yachao Yuan

TL;DR

This paper tackles the problem of identifying Tor users via flow correlation in the earliest stages of traffic, where existing methods struggle due to data requirements and latency. It introduces Early-MFC, a multi-view triplet-network framework that jointly learns embeddings from transport-layer payload (RAW) and inter-packet delays (IPDs), optimized with triplet and contrastive objectives and fused through a Bayesian arbiter. A complementary Early-MFC+ extension uses a payload-driven feature reconstruction to maintain high accuracy when only a small portion of packets is available, achieving about 93% accuracy with roughly 10% of the data. Across Tor-based datasets, Early-MFC significantly outperforms state-of-the-art attacks in accuracy and true positive rate while keeping false positives extremely low, enabling effective real-time flow correlation. The work opens a practical pathway for early-network traffic analysis and suggests robust multi-view fusion as a key strategy for rapid anonymity compromise in low-latency networks.

Abstract

Flow correlation attacks is an efficient network attacks, aiming to expose those who use anonymous network services, such as Tor. Conducting such attacks during the early stages of network communication is particularly critical for scenarios demanding rapid decision-making, such as cybercrime detection or financial fraud prevention. Although recent studies have made progress in flow correlation attacks techniques, research specifically addressing flow correlation with early network traffic flow remains limited. Moreover, due to factors such as model complexity, training costs, and real-time requirements, existing technologies cannot be directly applied to flow correlation with early network traffic flow. In this paper, we propose flow correlation attack with early network traffic, named Early-MFC, based on multi-view triplet networks. The proposed approach extracts multi-view traffic features from the payload at the transport layer and the Inter-Packet Delay. It then integrates multi-view flow information, converting the extracted features into shared embeddings. By leveraging techniques such as metric learning and contrastive learning, the method optimizes the embeddings space by ensuring that similar flows are mapped closer together while dissimilar flows are positioned farther apart. Finally, Bayesian decision theory is applied to determine flow correlation, enabling high-accuracy flow correlation with early network traffic flow. Furthermore, we investigate flow correlation attacks under extra-early network traffic flow conditions. To address this challenge, we propose Early-MFC+, which utilizes payload data to construct embedded feature representations, ensuring robust performance even with minimal packet availability.

Early-MFC: Enhanced Flow Correlation Attacks on Tor via Multi-view Triplet Networks with Early Network Traffic

TL;DR

This paper tackles the problem of identifying Tor users via flow correlation in the earliest stages of traffic, where existing methods struggle due to data requirements and latency. It introduces Early-MFC, a multi-view triplet-network framework that jointly learns embeddings from transport-layer payload (RAW) and inter-packet delays (IPDs), optimized with triplet and contrastive objectives and fused through a Bayesian arbiter. A complementary Early-MFC+ extension uses a payload-driven feature reconstruction to maintain high accuracy when only a small portion of packets is available, achieving about 93% accuracy with roughly 10% of the data. Across Tor-based datasets, Early-MFC significantly outperforms state-of-the-art attacks in accuracy and true positive rate while keeping false positives extremely low, enabling effective real-time flow correlation. The work opens a practical pathway for early-network traffic analysis and suggests robust multi-view fusion as a key strategy for rapid anonymity compromise in low-latency networks.

Abstract

Flow correlation attacks is an efficient network attacks, aiming to expose those who use anonymous network services, such as Tor. Conducting such attacks during the early stages of network communication is particularly critical for scenarios demanding rapid decision-making, such as cybercrime detection or financial fraud prevention. Although recent studies have made progress in flow correlation attacks techniques, research specifically addressing flow correlation with early network traffic flow remains limited. Moreover, due to factors such as model complexity, training costs, and real-time requirements, existing technologies cannot be directly applied to flow correlation with early network traffic flow. In this paper, we propose flow correlation attack with early network traffic, named Early-MFC, based on multi-view triplet networks. The proposed approach extracts multi-view traffic features from the payload at the transport layer and the Inter-Packet Delay. It then integrates multi-view flow information, converting the extracted features into shared embeddings. By leveraging techniques such as metric learning and contrastive learning, the method optimizes the embeddings space by ensuring that similar flows are mapped closer together while dissimilar flows are positioned farther apart. Finally, Bayesian decision theory is applied to determine flow correlation, enabling high-accuracy flow correlation with early network traffic flow. Furthermore, we investigate flow correlation attacks under extra-early network traffic flow conditions. To address this challenge, we propose Early-MFC+, which utilizes payload data to construct embedded feature representations, ensuring robust performance even with minimal packet availability.

Paper Structure

This paper contains 23 sections, 14 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: The structure of a Tor network. Data is collected at points A and B in the picture.
  • Figure 2: The figure describes the framework of Early-MFC, which is roughly divided into three blocks. Block ➀ describes the stage in which we process data, dividing the pacp data into available multi-view input data, and then putting the multi-view data into module ➁ for comparative learning triplet training. Block ➂ makes the final arbitration through the Bayesian rule.
  • Figure 3: The structure of the feature reconstruction network.
  • Figure 4: The figure describes the results of different choice of the second view under the same experimental conditions. (a) is the result of ACC, (b) is the result of TPR, and (c) is the result of FPR.
  • Figure 5: The figure describes the results of different arbitration