A Matrix Factorization Based Network Embedding Method for DNS Analysis
Meng Qin
TL;DR
This work addresses DNS entity analysis by learning joint embeddings for domain names and IP addresses from passive DNS logs using a similarity-enhanced graph and a matrix-factorization objective. The MF-DNS-E method builds a bipartite graph of DNS queries, derives domain/IP similarities, and applies a matrix-factorization framework with SVD to produce embeddings for domains and IPs. It further integrates supervised signals for malicious domain detection and IP reputation through logistic regression and a graph-regularization term, forming a semi-supervised objective optimized by gradient descent. The resulting embeddings enable improved DNS security analytics and can be extended to additional DNS entities and dynamic graph settings to capture temporal query patterns.
Abstract
In this paper, I explore the potential of network embedding (a.k.a. graph representation learning) to characterize DNS entities in passive network traffic logs. I propose an MF-DNS-E (\underline{M}atrix-\underline{F}actorization-based \underline{DNS} \underline{E}mbedding) method to represent DNS entities (e.g., domain names and IP addresses), where a random-walk-based matrix factorization objective is applied to learn the corresponding low-dimensional embeddings.
