Table of Contents
Fetching ...

An Improved Algorithm for Learning Drifting Discrete Distributions

Alessio Mazzetto

TL;DR

The paper addresses the problem of learning the current discrete distribution under time-varying drift from a sequence of one-sample-per-step observations. It introduces an adaptive algorithm that uses data-dependent bounds on the statistical error and a window-based history selector to balance drift and variance. The main result proves that with probability at least 1 - delta, the estimation error ||mu_T - hat_mu||_TV is bounded by O(min_{r} [ Lambda_r(mu_T) + sqrt(log((log^2 r + 1)/delta)/r) + Delta_r]), with Lambda_r(mu_T) capturing distribution-dependent learning complexity; in the shared-support case this reduces to O(sqrt(k/r) + Delta_r), and in the iid no-drift limit the bound matches near-tight rates up to logs. This work enables adaptive drift learning for arbitrary discrete distributions, including changing or infinite support, without prior drift knowledge, by introducing a general technique of data-dependent error bounds that could apply to other problems.

Abstract

We present a new adaptive algorithm for learning discrete distributions under distribution drift. In this setting, we observe a sequence of independent samples from a discrete distribution that is changing over time, and the goal is to estimate the current distribution. Since we have access to only a single sample for each time step, a good estimation requires a careful choice of the number of past samples to use. To use more samples, we must resort to samples further in the past, and we incur a drift error due to the bias introduced by the change in distribution. On the other hand, if we use a small number of past samples, we incur a large statistical error as the estimation has a high variance. We present a novel adaptive algorithm that can solve this trade-off without any prior knowledge of the drift. Unlike previous adaptive results, our algorithm characterizes the statistical error using data-dependent bounds. This technicality enables us to overcome the limitations of the previous work that require a fixed finite support whose size is known in advance and that cannot change over time. Additionally, we can obtain tighter bounds depending on the complexity of the drifting distribution, and also consider distributions with infinite support.

An Improved Algorithm for Learning Drifting Discrete Distributions

TL;DR

The paper addresses the problem of learning the current discrete distribution under time-varying drift from a sequence of one-sample-per-step observations. It introduces an adaptive algorithm that uses data-dependent bounds on the statistical error and a window-based history selector to balance drift and variance. The main result proves that with probability at least 1 - delta, the estimation error ||mu_T - hat_mu||_TV is bounded by O(min_{r} [ Lambda_r(mu_T) + sqrt(log((log^2 r + 1)/delta)/r) + Delta_r]), with Lambda_r(mu_T) capturing distribution-dependent learning complexity; in the shared-support case this reduces to O(sqrt(k/r) + Delta_r), and in the iid no-drift limit the bound matches near-tight rates up to logs. This work enables adaptive drift learning for arbitrary discrete distributions, including changing or infinite support, without prior drift knowledge, by introducing a general technique of data-dependent error bounds that could apply to other problems.

Abstract

We present a new adaptive algorithm for learning discrete distributions under distribution drift. In this setting, we observe a sequence of independent samples from a discrete distribution that is changing over time, and the goal is to estimate the current distribution. Since we have access to only a single sample for each time step, a good estimation requires a careful choice of the number of past samples to use. To use more samples, we must resort to samples further in the past, and we incur a drift error due to the bias introduced by the change in distribution. On the other hand, if we use a small number of past samples, we incur a large statistical error as the estimation has a high variance. We present a novel adaptive algorithm that can solve this trade-off without any prior knowledge of the drift. Unlike previous adaptive results, our algorithm characterizes the statistical error using data-dependent bounds. This technicality enables us to overcome the limitations of the previous work that require a fixed finite support whose size is known in advance and that cannot change over time. Additionally, we can obtain tighter bounds depending on the complexity of the drifting distribution, and also consider distributions with infinite support.
Paper Structure (7 sections, 7 theorems, 48 equations)

This paper contains 7 sections, 7 theorems, 48 equations.

Key Result

Theorem 2.1

Let $\delta \in (0,1)$. There exists an algorithm that given $X_1,\ldots,X_T$, it outputs a distribution $\bm{\hat{\mu}}$ such that with probability at least $1-\delta$, it holds that where $\Delta_r = \max_{0 \leq t < r}\lVert \bm{\mu}_T - \bm{\mu}_{T-t} \rVert_{\mathrm{TV}}$ as in eq:delta-definition.

Theorems & Definitions (15)

  • Theorem 2.1
  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • proof
  • Proposition 3.5
  • ...and 5 more