Adaptive scheduling for adaptive sampling in POS taggers construction

Manuel Vilares Ferro; Victor M. Darriba Bilbao; Jesús Vilares Ferro

Adaptive scheduling for adaptive sampling in POS taggers construction

Manuel Vilares Ferro, Victor M. Darriba Bilbao, Jesús Vilares Ferro

TL;DR

The paper tackles reducing training cost for large-scale POS tagger construction by introducing COLTS, an adaptive scheduling algorithm that selects the next training instance using a geometry-driven model of learning curves. It formalizes the learning process with a learning scheme ${\mathcal{D}}^{\mathcal{K}}_{\sigma}$, accuracy patterns ${\pi}$, and a notion of relevance via the concavity of the trace ${\mathcal{A}}^{\pi}[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$, proving correctness and robustness through step-size guarantees and anchoring mechanisms. A comprehensive testing frame with a uniform baseline, local testing frames, and inflation scenarios evaluates COLTS against geometric and arithmetic schedules across multiple POS-tagging resources (e.g., Penn Treebank, FROWN) and taggers, using metrics like data-acquisition-cost-savings ${\sc dacsr}$ and learning-cost-savings ${\sc lcsr}$. Results show COLTS frequently achieves the best overall learning-cost performance, demonstrating robustness to irregularities and port variations, and suggesting broad applicability to NLP tasks beyond taggers. The work provides a rigorous, domain-agnostic framework for adaptive sampling with formal guarantees and practical evaluation, offering immediate benefits for scalable NLP model construction.

Abstract

We introduce an adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers. The goal is to speed up the training on large data sets, without significant loss of performance with regard to an optimal configuration. In contrast to previous methods using a random, fixed or regularly rising spacing between the instances, ours analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time. The algorithm proves to be formally correct regarding our working hypotheses. Namely, given a case, the following one is the nearest ensuring a net gain of learning ability from the former, it being possible to modulate the level of requirement for this condition. We also improve the robustness of sampling by paying greater attention to those regions of the training data base subject to a temporary inflation in performance, thus preventing the learning from stopping prematurely. The proposal has been evaluated on the basis of its reliability to identify the convergence of models, corroborating our expectations. While a concrete halting condition is used for testing, users can choose any condition whatsoever to suit their own specific needs.

Adaptive scheduling for adaptive sampling in POS taggers construction

TL;DR

, accuracy patterns

, and a notion of relevance via the concavity of the trace

, proving correctness and robustness through step-size guarantees and anchoring mechanisms. A comprehensive testing frame with a uniform baseline, local testing frames, and inflation scenarios evaluates COLTS against geometric and arithmetic schedules across multiple POS-tagging resources (e.g., Penn Treebank, FROWN) and taggers, using metrics like data-acquisition-cost-savings

and learning-cost-savings

. Results show COLTS frequently achieves the best overall learning-cost performance, demonstrating robustness to irregularities and port variations, and suggesting broad applicability to NLP tasks beyond taggers. The work provides a rigorous, domain-agnostic framework for adaptive sampling with formal guarantees and practical evaluation, offering immediate benefits for scalable NLP model construction.

Abstract

Paper Structure (31 sections, 4 theorems, 23 equations, 10 figures, 2 tables)

This paper contains 31 sections, 4 theorems, 23 equations, 10 figures, 2 tables.

Introduction
The state of the art
Sampling scheduling
Our contribution
The formal framework
The working hypotheses
The notational support
The abstract model
Correctness
Robustness
Irregularities before the working level
Irregularities after the working level
The testing frame
The monitoring architecture
The testing rounds
...and 16 more sections

Key Result

Theorem 1

Let ${\mathcal{A}}^\pi[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$ be a learning trace, then: with $x_i := \left\Vert \mathcal{D}_i \right\Vert$, and $y=\alpha_i$ the horizontal asymptote for ${\mathcal{A}}_i^\pi[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$.

Figures (10)

Figure 1: Learning curve for fn tbl on frown, and an accuracy pattern fitting it.
Figure 2: Learning trace for fn tbl on frown.
Figure 3: Computing dynamically the size of individuals in a learning trace.
Figure 4: Learning trends for fn tbl on frown, using uniform and adaptive step functions.
Figure 5: Working and prediction levels for fn tbl on frown.
...and 5 more figures

Theorems & Definitions (12)

Definition 1
Definition 2
Definition 3
Definition 4
Theorem 1
Definition 5
Theorem 2
Definition 6
Definition 7
Theorem 3
...and 2 more

Adaptive scheduling for adaptive sampling in POS taggers construction

TL;DR

Abstract

Adaptive scheduling for adaptive sampling in POS taggers construction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (12)