WHOIS Right? An Analysis of WHOIS and RDAP Consistency
Simon Fernandez, Olivier Hureau, Andrzej Duda, Maciej Korczynski
TL;DR
This work interrogates the common assumption that WHOIS and RDAP provide interchangeable domain registration data. It introduces a scalable, ethics-conscious data collection and parsing pipeline and analyzes 164 million records from 55 million domains to assess cross-source consistency. The study finds that while overall agreement is high, $7.6\%$ of domains exhibit inconsistencies in key fields, with RDAP data aligning with DNS more often and being correct in about $78\%$ of cross-protocol mismatches. The findings highlight the need for multi-source verification in security metrics and GDPR-driven data considerations, and the authors contribute a publicly available dataset and tooling to support further research.
Abstract
Public registration information on domain names, such as the accredited registrar, the domain name expiration date, or the abusecontact is crucial for many security tasks, from automated abuse notifications to botnet or phishing detection and classification systems. Various domain registration data is usually accessible through the WHOIS or RDAP protocols-a priori they provide the same data but use distinct formats and communication protocols. While WHOIS aims to provide human-readable data, RDAP uses a machine-readable format. Therefore, deciding which protocol to use is generally considered a straightforward technical choice, depending on the use case and the required automation and security level. In this paper, we examine the core assumption that WHOIS and RDAP offer the same data and that users can query them interchangeably. By collecting, processing, and comparing 164 million WHOIS and RDAP records for a sample of 55 million domain names, we reveal that while the data obtained through WHOIS and RDAP is generally consistent, 7.6% of the observed domains still present inconsistent data on important fields like IANA ID, creation date, or nameservers. Such variances should receive careful consideration from security stakeholders reliant on the accuracy of these fields.
