Table of Contents
Fetching ...

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

Simon Fernandez, Olivier Hureau, Andrzej Duda, Maciej Korczynski

TL;DR

This work interrogates the common assumption that WHOIS and RDAP provide interchangeable domain registration data. It introduces a scalable, ethics-conscious data collection and parsing pipeline and analyzes 164 million records from 55 million domains to assess cross-source consistency. The study finds that while overall agreement is high, $7.6\%$ of domains exhibit inconsistencies in key fields, with RDAP data aligning with DNS more often and being correct in about $78\%$ of cross-protocol mismatches. The findings highlight the need for multi-source verification in security metrics and GDPR-driven data considerations, and the authors contribute a publicly available dataset and tooling to support further research.

Abstract

Public registration information on domain names, such as the accredited registrar, the domain name expiration date, or the abusecontact is crucial for many security tasks, from automated abuse notifications to botnet or phishing detection and classification systems. Various domain registration data is usually accessible through the WHOIS or RDAP protocols-a priori they provide the same data but use distinct formats and communication protocols. While WHOIS aims to provide human-readable data, RDAP uses a machine-readable format. Therefore, deciding which protocol to use is generally considered a straightforward technical choice, depending on the use case and the required automation and security level. In this paper, we examine the core assumption that WHOIS and RDAP offer the same data and that users can query them interchangeably. By collecting, processing, and comparing 164 million WHOIS and RDAP records for a sample of 55 million domain names, we reveal that while the data obtained through WHOIS and RDAP is generally consistent, 7.6% of the observed domains still present inconsistent data on important fields like IANA ID, creation date, or nameservers. Such variances should receive careful consideration from security stakeholders reliant on the accuracy of these fields.

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

TL;DR

This work interrogates the common assumption that WHOIS and RDAP provide interchangeable domain registration data. It introduces a scalable, ethics-conscious data collection and parsing pipeline and analyzes 164 million records from 55 million domains to assess cross-source consistency. The study finds that while overall agreement is high, of domains exhibit inconsistencies in key fields, with RDAP data aligning with DNS more often and being correct in about of cross-protocol mismatches. The findings highlight the need for multi-source verification in security metrics and GDPR-driven data considerations, and the authors contribute a publicly available dataset and tooling to support further research.

Abstract

Public registration information on domain names, such as the accredited registrar, the domain name expiration date, or the abusecontact is crucial for many security tasks, from automated abuse notifications to botnet or phishing detection and classification systems. Various domain registration data is usually accessible through the WHOIS or RDAP protocols-a priori they provide the same data but use distinct formats and communication protocols. While WHOIS aims to provide human-readable data, RDAP uses a machine-readable format. Therefore, deciding which protocol to use is generally considered a straightforward technical choice, depending on the use case and the required automation and security level. In this paper, we examine the core assumption that WHOIS and RDAP offer the same data and that users can query them interchangeably. By collecting, processing, and comparing 164 million WHOIS and RDAP records for a sample of 55 million domain names, we reveal that while the data obtained through WHOIS and RDAP is generally consistent, 7.6% of the observed domains still present inconsistent data on important fields like IANA ID, creation date, or nameservers. Such variances should receive careful consideration from security stakeholders reliant on the accuracy of these fields.
Paper Structure (33 sections, 7 figures, 5 tables)

This paper contains 33 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Referral system to obtain complete registration data
  • Figure 2: The stages of domain selection with the number of domains at each step
  • Figure 3: Nameserver mismatch rate per registrar
  • Figure 4: Cumulative distribution of creation and expiration date mismatches
  • Figure 5: Creation date mismatch rate per registrar
  • ...and 2 more figures