WHOIS Right? An Analysis of WHOIS and RDAP Consistency

Simon Fernandez; Olivier Hureau; Andrzej Duda; Maciej Korczynski

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

Simon Fernandez, Olivier Hureau, Andrzej Duda, Maciej Korczynski

TL;DR

This work interrogates the common assumption that WHOIS and RDAP provide interchangeable domain registration data. It introduces a scalable, ethics-conscious data collection and parsing pipeline and analyzes 164 million records from 55 million domains to assess cross-source consistency. The study finds that while overall agreement is high, $7.6\%$ of domains exhibit inconsistencies in key fields, with RDAP data aligning with DNS more often and being correct in about $78\%$ of cross-protocol mismatches. The findings highlight the need for multi-source verification in security metrics and GDPR-driven data considerations, and the authors contribute a publicly available dataset and tooling to support further research.

Abstract

Public registration information on domain names, such as the accredited registrar, the domain name expiration date, or the abusecontact is crucial for many security tasks, from automated abuse notifications to botnet or phishing detection and classification systems. Various domain registration data is usually accessible through the WHOIS or RDAP protocols-a priori they provide the same data but use distinct formats and communication protocols. While WHOIS aims to provide human-readable data, RDAP uses a machine-readable format. Therefore, deciding which protocol to use is generally considered a straightforward technical choice, depending on the use case and the required automation and security level. In this paper, we examine the core assumption that WHOIS and RDAP offer the same data and that users can query them interchangeably. By collecting, processing, and comparing 164 million WHOIS and RDAP records for a sample of 55 million domain names, we reveal that while the data obtained through WHOIS and RDAP is generally consistent, 7.6% of the observed domains still present inconsistent data on important fields like IANA ID, creation date, or nameservers. Such variances should receive careful consideration from security stakeholders reliant on the accuracy of these fields.

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

TL;DR

of domains exhibit inconsistencies in key fields, with RDAP data aligning with DNS more often and being correct in about

of cross-protocol mismatches. The findings highlight the need for multi-source verification in security metrics and GDPR-driven data considerations, and the authors contribute a publicly available dataset and tooling to support further research.

Abstract

Paper Structure (33 sections, 7 figures, 5 tables)

This paper contains 33 sections, 7 figures, 5 tables.

Introduction
Background
The Ecosystem of Domain Management and Registration
Why Two Different Systems?
Data Access and Availability
Parsing Registration Data
Methodology
Domain Data Collection and Filtering
Compilation of registered domain names.
Filtering domains with valid WHOIS and RDAP servers.
Gathering and Parsing Resgistration Data
Data collection.
Parsing WHOIS.
Parsing RDAP.
Field selection.
...and 18 more sections

Figures (7)

Figure 1: Referral system to obtain complete registration data
Figure 2: The stages of domain selection with the number of domains at each step
Figure 3: Nameserver mismatch rate per registrar
Figure 4: Cumulative distribution of creation and expiration date mismatches
Figure 5: Creation date mismatch rate per registrar
...and 2 more figures

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

TL;DR

Abstract

WHOIS Right? An Analysis of WHOIS and RDAP Consistency

Authors

TL;DR

Abstract

Table of Contents

Figures (7)