Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation
Xiaofeng Cao, Mingwei Xu, Xin Yu, Jiangchao Yao, Wei Ye, Shengjun Huang, Minling Zhang, Ivor W. Tsang, Yew Soon Ong, James T. Kwok, Heng Tao Shen
TL;DR
This paper addresses robust generalization when labeled data are scarce by formulating agnostic active sampling within the Probably Approximately Correct framework to bound generalization error and label complexity under model-agnostic settings. It develops a theoretical backbone linking few-shot and active learning, introduces error disagreement as a central mechanism, and derives both supervised and unsupervised label-complexity bounds that guide low-resource learning. Building on these guarantees, it surveys optimization strategies—ranging from gradient-based and geometry-aware methods to meta-iteration and LLM-powered approaches—and analyzes how these tools operate in domain transfer, reinforcement feedback, and hierarchical structure modeling. The work bridges theory and practice by outlining concrete conditions under which low-resource data can approximate high-resource performance and by profiling practical paradigms for data-efficient AI with broad implications for scalable, resource-constrained learning systems.
Abstract
Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI); however, the costs associated with data annotation and model training remain significant. A fundamental objective of AI research is to achieve robust generalization with limited-resource data. This survey employs agnostic active sampling theory within the Probably Approximately Correct (PAC) framework to analyze the generalization error and label complexity associated with learning from low-resource data in both model-agnostic supervised and unsupervised settings. Based on this analysis, we investigate a suite of optimization strategies tailored for low-resource data learning, including gradient-informed optimization, meta-iteration optimization, geometry-aware optimization, and LLMs-powered optimization. Furthermore, we provide a comprehensive overview of multiple learning paradigms that can benefit from low-resource data, including domain transfer, reinforcement feedback, and hierarchical structure modeling. Finally, we conclude our analysis and investigation by summarizing the key findings and highlighting their implications for learning with low-resource data.
