kendallknight: An R Package for Efficient Implementation of Kendall's Correlation Coefficient Computation
Mauricio Vargas Sepúlveda
TL;DR
kendallknight tackles the computational bottleneck of Kendall's tau by delivering an $O(n \\log(n))$ algorithm implemented in C++ and exposed to R via cpp11. It follows a sorting-and-inversion-counting approach that yields the correlation via $r(x,y) = t / \sqrt{(m - m_x)(m - m_y)}$ with $m = n(n - 1)/2$, while properly accounting for ties through $m_x$, $m_y$, and $t_p$. The package additionally provides significance testing using a Gamma-function port, and includes exhaustive testing and cross-language benchmarks. Empirical results show dramatic speedups and reduced memory usage over base R's Kendall implementation, enabling rapid correlation analyses in econometrics and related fields.
Abstract
The kendallknight package introduces an efficient implementation of Kendall's correlation coefficient computation, significantly improving the processing time for large datasets without sacrificing accuracy. The kendallknight package, following Knight (1966) and posterior literature, reduces the computational complexity resulting in drastic reductions in computation time, transforming operations that would take minutes or hours into milliseconds or minutes, while maintaining precision and correctly handling edge cases and errors. The package is particularly advantageous in econometric and statistical contexts where rapid and accurate calculation of Kendall's correlation coefficient is desirable. Benchmarks demonstrate substantial performance gains over the base R implementation, especially for large datasets.
