Table of Contents
Fetching ...

Unpacking .zip: A First Look at Domain and File Name Confusion

Predrag Despotovic, Pranab Mishra, Kevin Rossel, Athanasios Avgetidis, Zane Ma

Abstract

The namespace for filenames and DNS names has overlapped since the introduction of DNS in 1985: \texttt{.com} was the original binary format used for DOS and CP/M systems. Recently the introduction of gTLDs such as \texttt{.zip} and \texttt{.mov}, coupled with the growing prevalence of web resources, has ignited new concerns about potential issues related to DNS and filename confusion. Thus far, the discourse on DNS/filename confusion has been piecemeal and hypothetical, making it unclear what, if any, security concerns credibly exist. To address this gap, we provide the first enumeration of how DNS/filename confusion can be abused. We then perform the first empirical case studies of DNS/filename confusion in the wild, which highlights suspected confusion across a wide range of software. Finally, based on our preliminary findings, we provide suggestions and guidance for future research on this topic.

Unpacking .zip: A First Look at Domain and File Name Confusion

Abstract

The namespace for filenames and DNS names has overlapped since the introduction of DNS in 1985: \texttt{.com} was the original binary format used for DOS and CP/M systems. Recently the introduction of gTLDs such as \texttt{.zip} and \texttt{.mov}, coupled with the growing prevalence of web resources, has ignited new concerns about potential issues related to DNS and filename confusion. Thus far, the discourse on DNS/filename confusion has been piecemeal and hypothetical, making it unclear what, if any, security concerns credibly exist. To address this gap, we provide the first enumeration of how DNS/filename confusion can be abused. We then perform the first empirical case studies of DNS/filename confusion in the wild, which highlights suspected confusion across a wide range of software. Finally, based on our preliminary findings, we provide suggestions and guidance for future research on this topic.

Paper Structure

This paper contains 21 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Conceptual threat model of namespace confusion between filenames and domains.
  • Figure 2: TLDs with likely filename confusion and relatively low DNS usage.
  • Figure 3: Filtering pipeline for HTTP request dataset. Each stage represents a filtering step applied to remove irrelevant or automated traffic, showing the remaining number of requests ($n$) after each step.
  • Figure 4: Cumulative distribution function for number of requests per IP address
  • Figure 5: Geographic distribution of preview and click requests