Practical machine learning is learning on small samples
Marina Sapir
TL;DR
This work reframes machine learning from a statistical, asymptotic paradigm to a practical, logic-grounded one. It argues that real-world learning operates under implicit smoothness and finite data, and introduces the Practical learning paradigm built on baseline cases, counterparts, and inconsistency measures. By showing that common algorithms like ERM, k-NN, decision trees, Naive Bayes, and linear SVM/SVR can be interpreted as practical learners minimizing total inconsistency, the paper justifies a unifying framework grounded in abduction and oscillation-minimizing criteria. The approach enables meaningful comparisons, robust testing, and handling of outliers and data-scarce scenarios, with broad implications for practice and future extensions to other ML problems.
Abstract
Based on limited observations, machine learning discerns a dependence which is expected to hold in the future. What makes it possible? Statistical learning theory imagines indefinitely increasing training sample to justify its approach. In reality, there is no infinite time or even infinite general population for learning. Here I argue that practical machine learning is based on an implicit assumption that underlying dependence is relatively ``smooth" : likely, there are no abrupt differences in feedback between cases with close data points. From this point of view learning shall involve selection of the hypothesis ``smoothly" approximating the training set. I formalize this as Practical learning paradigm. The paradigm includes terminology and rules for description of learners. Popular learners (local smoothing, k-NN, decision trees, Naive Bayes, SVM for classification and for regression) are shown here to be implementations of this paradigm.
