Shining Light into the Tunnel: Understanding and Classifying Network Traffic of Residential Proxies
Ronghong Huang, Dongfang Zhao, Xianghang Mi, Xiaofeng Wang
TL;DR
This work addresses the lack of large-scale RESIP traffic datasets and robust detection methods by building a scalable framework that deploys RESIP nodes, collects traffic, and analyzes it for security risks. It introduces a RESIP traffic analyzer and transformer-based as well as feature-based classifiers to detect relayed and tunnel RESIP traffic using early-flow information, validated on a 3.3 TB, 116 million-flow dataset collected from PacketStream, IPRoyal, and Honeygain in the US and China. The study reveals novel misuse patterns, including masquerading visits to security-sensitive sites, large-scale email spamming, and visits to malicious destinations, supported by threat-intelligence profiling. The findings inform practical defense strategies and policy recommendations, including detector deployment and owner-authorization mechanisms, while contributing publicly accessible data and tools to advance RESIP research and security assessment.
Abstract
Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP users to effectively masquerade their traffic flows as ones from authentic residential users, which leads to the increasing adoption of RESIP services, especially in malicious online activities. However, regarding the (malicious) usage of RESIPs (i.e., what traffic is relayed by RESIPs), current understanding turns out to be insufficient. Particularly, previous works on RESIP traffic studied only the maliciousness of web traffic destinations and the suspicious patterns of visiting popular websites. Also, a general methodology is missing regarding capturing large-scale RESIP traffic and analyzing RESIP traffic for security risks. Furthermore, considering many RESIP nodes are found to be located in corporate networks and are deployed without proper authorization from device owners or network administrators, it is becoming increasingly necessary to detect and block RESIP traffic flows, which unfortunately is impeded by the scarcity of realistic RESIP traffic datasets and effective detection methodologies. To fill in these gaps, multiple novel tools have been designed and implemented in this study, which include a general framework to deploy RESIP nodes and collect RESIP traffic in a distributed manner, a RESIP traffic analyzer to efficiently process RESIP traffic logs and surface out suspicious traffic flows, and multiple machine learning based RESIP traffic classifiers to timely and accurately detect whether a given traffic flow is RESIP traffic or not.
