LINSCAN -- A Linearity Based Clustering Algorithm
Andrew Dennehy, Xiaoyu Zou, Shabnam J. Semnani, Yuri Fialko, Alexander Cloninger
TL;DR
LINSCAN addresses the challenge of identifying quasi-linear clusters in noisy point clouds, such as seismic faults, by embedding each point as a local Gaussian based on its eccPts-nearest neighbors and clustering in distribution space using a KL-inspired distance $D(P,Q)$. The method preserves the stability and permutation-invariance of DBSCAN/OPTICS while enabling separation of linearly shaped clusters with orthogonal covariances, thanks to the distribution-space embedding and a symmetric, approximate-distance measure that incorporates both covariance structure and mean differences. Key contributions include the embedding scheme $x\mapsto \mathcal{N}(\mu,\Sigma)$, the distance $D(P,Q)$ derived from KL-divergence approximations, and empirical validation on synthetic and seismic data, including a covariance-based quality filter to enforce linearity and robust performance. The approach yields improved identification of linear slip faults and holds promise for other directional spatial patterns, offering a practical, scalable variant of DBSCAN/OPTICS with a bias toward lineation.
Abstract
DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms.
