Table of Contents
Fetching ...

Absolute convergence and error thresholds in non-active adaptive sampling

Manuel Vilares Ferro, Victor M. Darriba Bilbao, Jesús Vilares Ferro

TL;DR

The paper addresses the problem of determining when to stop data collection in non-active adaptive sampling by introducing absolute convergence and error thresholds. It develops a formal framework around learning traces, accuracy patterns, and anchoring techniques to ensure correctness, robustness, and completeness. The main contributions are the fixed anchoring mechanism for guaranteed decreasing backbones, flexible anchoring variants with look-ahead, and a rigorous testing frame to quantify cost-benefit trade-offs, demonstrated on NLP POS tagging. The findings show that absolute thresholds provide reliable stopping criteria at the cost of higher computation, with practical guidance on anchor selection and look-ahead to balance efficiency and convergence guarantees.

Abstract

Non-active adaptive sampling is a way of building machine learning models from a training data base which are supposed to dynamically and automatically derive guaranteed sample size. In this context and regardless of the strategy used in both scheduling and generating of weak predictors, a proposal for calculating absolute convergence and error thresholds is described. We not only make it possible to establish when the quality of the model no longer increases, but also supplies a proximity condition to estimate in absolute terms how close it is to achieving such a goal, thus supporting decision making for fine-tuning learning parameters in model selection. The technique proves its correctness and completeness with respect to our working hypotheses, in addition to strengthening the robustness of the sampling scheme. Tests meet our expectations and illustrate the proposal in the domain of natural language processing, taking the generation of part-of-speech taggers as case study.

Absolute convergence and error thresholds in non-active adaptive sampling

TL;DR

The paper addresses the problem of determining when to stop data collection in non-active adaptive sampling by introducing absolute convergence and error thresholds. It develops a formal framework around learning traces, accuracy patterns, and anchoring techniques to ensure correctness, robustness, and completeness. The main contributions are the fixed anchoring mechanism for guaranteed decreasing backbones, flexible anchoring variants with look-ahead, and a rigorous testing frame to quantify cost-benefit trade-offs, demonstrated on NLP POS tagging. The findings show that absolute thresholds provide reliable stopping criteria at the cost of higher computation, with practical guidance on anchor selection and look-ahead to balance efficiency and convergence guarantees.

Abstract

Non-active adaptive sampling is a way of building machine learning models from a training data base which are supposed to dynamically and automatically derive guaranteed sample size. In this context and regardless of the strategy used in both scheduling and generating of weak predictors, a proposal for calculating absolute convergence and error thresholds is described. We not only make it possible to establish when the quality of the model no longer increases, but also supplies a proximity condition to estimate in absolute terms how close it is to achieving such a goal, thus supporting decision making for fine-tuning learning parameters in model selection. The technique proves its correctness and completeness with respect to our working hypotheses, in addition to strengthening the robustness of the sampling scheme. Tests meet our expectations and illustrate the proposal in the domain of natural language processing, taking the generation of part-of-speech taggers as case study.
Paper Structure (38 sections, 10 theorems, 59 equations, 10 figures, 5 tables)

This paper contains 38 sections, 10 theorems, 59 equations, 10 figures, 5 tables.

Key Result

Theorem 1

(Canonical anchoring) Let ${\mathcal{A}}^\pi[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$ be a learning trace with asymptotic backbone $\{\alpha_i\}_{i \in \mathbb{N}}$ and $\{\hat{\mathcal{A}}_i(\infty)\}_{i > \omega}$ the sequence defined from its wlevel$\omega$ as with $\hat{\mathcal{A}}_{i}^\pi[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$ a curve fitting $\{[x_j, {\mathcal{A}}_{\pmb{\pmb{\infty}}{}}[{\ma

Figures (10)

  • Figure 1: Learning curve for fn tbl on frown corpus, and an accuracy pattern fitting it.
  • Figure 2: Learning trace for fn tbl on frown, with details in zoom.
  • Figure 3: Working and prediction levels without and with canonical anchors for fn tbl on frown, with details in zoom.
  • Figure 4: Asymptotic backbones without and with fixed anchors for max ent on penn.
  • Figure 5: Asymptotic backbones with fixed anchors for max ent on penn (look-ahead $\imath$, value $\beta$).
  • ...and 5 more figures

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • ...and 11 more