Adaptive scheduling for adaptive sampling in POS taggers construction
Manuel Vilares Ferro, Victor M. Darriba Bilbao, Jesús Vilares Ferro
TL;DR
The paper tackles reducing training cost for large-scale POS tagger construction by introducing COLTS, an adaptive scheduling algorithm that selects the next training instance using a geometry-driven model of learning curves. It formalizes the learning process with a learning scheme ${\mathcal{D}}^{\mathcal{K}}_{\sigma}$, accuracy patterns ${\pi}$, and a notion of relevance via the concavity of the trace ${\mathcal{A}}^{\pi}[{\mathcal{D}}^{\mathcal{K}}_{\sigma}]$, proving correctness and robustness through step-size guarantees and anchoring mechanisms. A comprehensive testing frame with a uniform baseline, local testing frames, and inflation scenarios evaluates COLTS against geometric and arithmetic schedules across multiple POS-tagging resources (e.g., Penn Treebank, FROWN) and taggers, using metrics like data-acquisition-cost-savings ${\sc dacsr}$ and learning-cost-savings ${\sc lcsr}$. Results show COLTS frequently achieves the best overall learning-cost performance, demonstrating robustness to irregularities and port variations, and suggesting broad applicability to NLP tasks beyond taggers. The work provides a rigorous, domain-agnostic framework for adaptive sampling with formal guarantees and practical evaluation, offering immediate benefits for scalable NLP model construction.
Abstract
We introduce an adaptive scheduling for adaptive sampling as a novel way of machine learning in the construction of part-of-speech taggers. The goal is to speed up the training on large data sets, without significant loss of performance with regard to an optimal configuration. In contrast to previous methods using a random, fixed or regularly rising spacing between the instances, ours analyzes the shape of the learning curve geometrically in conjunction with a functional model to increase or decrease it at any time. The algorithm proves to be formally correct regarding our working hypotheses. Namely, given a case, the following one is the nearest ensuring a net gain of learning ability from the former, it being possible to modulate the level of requirement for this condition. We also improve the robustness of sampling by paying greater attention to those regions of the training data base subject to a temporary inflation in performance, thus preventing the learning from stopping prematurely. The proposal has been evaluated on the basis of its reliability to identify the convergence of models, corroborating our expectations. While a concrete halting condition is used for testing, users can choose any condition whatsoever to suit their own specific needs.
