DEViaN-LM: An R Package for Detecting Abnormal Values in the Gaussian Linear Model
Geoffroy Berthelot, Guillaume Saulière, Jérôme Dedecker
TL;DR
The paper addresses detecting abnormal values poorly explained by a Gaussian linear model by leveraging the maximum absolute value of externally studentized residuals, $T_n = \max_i |\hat e_i(X)|$, whose distribution is free of the unknown parameters $\theta$ and $\sigma^2$ when conditioned on the design $M$. Because the distribution depends on the design, the authors propose Monte-Carlo estimation of quantiles $c_{\alpha,n}$ and p-values for a given $M$ within the DEViaN-LM R package. The package returns the residuals, outlier indices, the threshold, and a binary outlier indicator, enabling automated abnormal-value detection across real datasets. They demonstrate applications to biological and sociological data, and show favorable runtime performance relative to naïve implementations. The work provides a practical, design-aware tool for individualized outlier detection in Gaussian linear models with clear implications for precision medicine and longitudinal monitoring.
Abstract
The DEViaN-LM is a R package that allows to detect the values poorly explained by a Gaussian linear model. The procedure is based on the maximum of the absolute value of the studentized residuals, which is a free statistic of the parameters of the model. This approach makes it possible to generalize several procedures used to detect abnormal values during longitudinal monitoring of certain biological markers. In this article, we describe the method used, and we show how to implement it on different real datasets.
