Distinguishing Tor From Other Encrypted Network Traffic Through Character Analysis
Pitpimon Choorod, Tobias J. Bauer, Andreas Aßmuth
TL;DR
The paper investigates whether Tor traffic can be distinguished from other encrypted traffic by analyzing hex-digit distributions in a single encrypted payload, and tests the hypothesis that multiple encryptions in Tor produce detectable patterns. Grounded in Shannon's perfect secrecy condition $Pr(C=c\,|\,M=m)=Pr(C=c)$ and adversarial indistinguishability, it compares prior hex-frequency approaches with controlled AES-based single-vs-triple-encryption experiments across CBC, CTR, and ECB modes. It finds that classifiers using 16-hex-digit frequency features achieve near-random accuracy ($\approx50\%$) for distinguishing single- vs triple-encrypted data, even in ECB mode, challenging the idea that the number of encryptions drives distinguishability. The results imply that the previously reported high Tor-vs-non-Tor discrimination cannot be attributed solely to encryption count, highlighting a need to identify the actual factors behind traffic distinguishability in practice.
Abstract
For journalists reporting from a totalitarian regime, whistleblowers and resistance fighters, the anonymous use of cloud services on the Internet can be vital for survival. The Tor network provides a free and widely used anonymization service for everyone. However, there are different approaches to distinguishing Tor from non-Tor encrypted network traffic, most recently only due to the (relative) frequencies of hex digits in a single encrypted payload packet. While conventional data traffic is usually encrypted once, but at least three times in the case of Tor due to the structure and principle of the Tor network, we have examined to what extent the number of encryptions contributes to being able to distinguish Tor from non-Tor encrypted data traffic.
