On recovering the Radon-Nikodym derivative under the big data assumption

Hanna Myleiko; Sergei Solodky

On recovering the Radon-Nikodym derivative under the big data assumption

Hanna Myleiko, Sergei Solodky

TL;DR

This work tackles recovering the Radon-Nikodym derivative $β=\frac{dq}{dp}$ in a big-data setting by introducing a regularized Nyström subsampling scheme combined with standard Tikhonov regularization. It establishes convergence rates for both the case where $β$ lies in the RKHS $\mathcal{H}_{\mathsf{K}}$ and the case where $β$ resides in $L_{2,p}/\mathcal{H}_{\mathsf{K}}$, reflecting the smoothness through an index function $\phi$ and the kernel capacity via the effective dimension. The analysis yields high-probability bounds in $\mathcal{H}_{\mathsf{K}}$ and $L_{2,p}$, with explicit rate expressions under common choices $\phi(t)=t^{s}$ and $\zeta(t)=t^{r}$, and prescribes a parameter selection that achieves optimal trade-offs. Importantly, the proposed approach attains subquadratic computational costs in the number of observations, enabling scalable density-ratio estimation for large datasets.

Abstract

The present paper is focused on the problem of recovering the Radon-Nikodym derivative under the big data assumption. To address the above problem, we design an algorithm that is a combination of the Nyström subsampling and the standard Tikhonov regularization. The convergence rate of the corresponding algorithm is established both in the case when the Radon-Nikodym derivative belongs to RKHS and in the case when it does not. We prove that the proposed approach not only ensures the order of accuracy as algorithms based on the whole sample size, but also allows to achieve subquadratic computational costs in the number of observations.

On recovering the Radon-Nikodym derivative under the big data assumption

TL;DR

This work tackles recovering the Radon-Nikodym derivative

in a big-data setting by introducing a regularized Nyström subsampling scheme combined with standard Tikhonov regularization. It establishes convergence rates for both the case where

lies in the RKHS

and the case where

resides in

, reflecting the smoothness through an index function

and the kernel capacity via the effective dimension. The analysis yields high-probability bounds in

and

, with explicit rate expressions under common choices

and

, and prescribes a parameter selection that achieves optimal trade-offs. Importantly, the proposed approach attains subquadratic computational costs in the number of observations, enabling scalable density-ratio estimation for large datasets.

On recovering the Radon-Nikodym derivative under the big data assumption

TL;DR

Abstract

On recovering the Radon-Nikodym derivative under the big data assumption

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (23)