Table of Contents
Fetching ...

On recovering the Radon-Nikodym derivative under the big data assumption

Hanna Myleiko, Sergei Solodky

TL;DR

This work tackles recovering the Radon-Nikodym derivative $β=\frac{dq}{dp}$ in a big-data setting by introducing a regularized Nyström subsampling scheme combined with standard Tikhonov regularization. It establishes convergence rates for both the case where $β$ lies in the RKHS $\mathcal{H}_{\mathsf{K}}$ and the case where $β$ resides in $L_{2,p}/\mathcal{H}_{\mathsf{K}}$, reflecting the smoothness through an index function $\phi$ and the kernel capacity via the effective dimension. The analysis yields high-probability bounds in $\mathcal{H}_{\mathsf{K}}$ and $L_{2,p}$, with explicit rate expressions under common choices $\phi(t)=t^{s}$ and $\zeta(t)=t^{r}$, and prescribes a parameter selection that achieves optimal trade-offs. Importantly, the proposed approach attains subquadratic computational costs in the number of observations, enabling scalable density-ratio estimation for large datasets.

Abstract

The present paper is focused on the problem of recovering the Radon-Nikodym derivative under the big data assumption. To address the above problem, we design an algorithm that is a combination of the Nyström subsampling and the standard Tikhonov regularization. The convergence rate of the corresponding algorithm is established both in the case when the Radon-Nikodym derivative belongs to RKHS and in the case when it does not. We prove that the proposed approach not only ensures the order of accuracy as algorithms based on the whole sample size, but also allows to achieve subquadratic computational costs in the number of observations.

On recovering the Radon-Nikodym derivative under the big data assumption

TL;DR

This work tackles recovering the Radon-Nikodym derivative in a big-data setting by introducing a regularized Nyström subsampling scheme combined with standard Tikhonov regularization. It establishes convergence rates for both the case where lies in the RKHS and the case where resides in , reflecting the smoothness through an index function and the kernel capacity via the effective dimension. The analysis yields high-probability bounds in and , with explicit rate expressions under common choices and , and prescribes a parameter selection that achieves optimal trade-offs. Importantly, the proposed approach attains subquadratic computational costs in the number of observations, enabling scalable density-ratio estimation for large datasets.

Abstract

The present paper is focused on the problem of recovering the Radon-Nikodym derivative under the big data assumption. To address the above problem, we design an algorithm that is a combination of the Nyström subsampling and the standard Tikhonov regularization. The convergence rate of the corresponding algorithm is established both in the case when the Radon-Nikodym derivative belongs to RKHS and in the case when it does not. We prove that the proposed approach not only ensures the order of accuracy as algorithms based on the whole sample size, but also allows to achieve subquadratic computational costs in the number of observations.

Paper Structure

This paper contains 13 sections, 18 theorems, 166 equations.

Key Result

Proposition 2.1

LuPer Let the regularization method is indexed by $g_{\alpha}(t)$ and has the qualification $p$. If this qualification covers the index function $\phi$, then where $\hat{\gamma} = \max\{\gamma_{0}, \gamma_{p}\}$.

Theorems & Definitions (23)

  • Definition 2.1
  • Proposition 2.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Proposition 3.7
  • Proposition 3.8
  • Lemma 3.9
  • ...and 13 more