Table of Contents
Fetching ...

On computing and the complexity of computing higher-order $U$-statistics, exactly

Xingyu Chen, Ruiqi Zhang, Lin Liu

Abstract

Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from a $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing $U$-statistics, based on the treewidth of a particular graph associated with the $U$-statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order $U$-statistics, and (2) a more streamlined characterization of runtime complexity of computing $U$-statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (https://github.com/zrq1706/U-Statistics-Python) and R (https://github.com/cxy0714/U-Statistics-R). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of $U$-statistics and (2) to lift the burden of implementing higher-order $U$-statistics from practitioners.

On computing and the complexity of computing higher-order $U$-statistics, exactly

Abstract

Higher-order -statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of -statistics. First, we derive a useful decomposition from a -th order -statistic to a linear combination of -statistics with orders not exceeding , which are generally more feasible to compute. Second, we explore the connection between exactly computing -statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing -statistics, based on the treewidth of a particular graph associated with the -statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order -statistics, and (2) a more streamlined characterization of runtime complexity of computing -statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (https://github.com/zrq1706/U-Statistics-Python) and R (https://github.com/cxy0714/U-Statistics-R). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of -statistics and (2) to lift the burden of implementing higher-order -statistics from practitioners.

Paper Structure

This paper contains 24 sections, 15 theorems, 96 equations, 5 figures, 8 tables, 4 algorithms.

Key Result

Proposition 1

Let $\mathcal{A}$ be an Einsum notation with no output, and $G_\mathcal{A}$ be the corresponding decomposition graph of $\mathcal{A}$. Then there exists an algorithm with a particular Einsum ordering such that for any set of tensors $\mathcal{T}$ that can be represented by the Einsum notation $\math

Figures (5)

  • Figure 1: Examples of decomposition graphs of different decomposition signatures.
  • Figure 2: The decomposition graph $G_{\mathcal{A}_{S}}$, which consists of $a + |S|$ isolated vertices.
  • Figure 3: Graphs $G_1$ to $G_{15}$ with increasing edges and maximum treewidth.
  • Figure 4: Two decomposition graphs used in Appendix \ref{['app:HOIF']}.
  • Figure 5: All non-degenerate isomorphism classes of 3-vertex and 4-vertex simple undirected graphs.

Theorems & Definitions (58)

  • Definition 1: $U$-Statistic
  • Definition 2: $V$-Statistic
  • Definition 3: Tensor
  • Definition 4: Einsum Notation
  • Definition 5: Einsum Operation
  • Example 1
  • Remark 1
  • Remark 2
  • Definition 6: Vertex Elimination
  • Definition 7: Treewidth
  • ...and 48 more