Table of Contents
Fetching ...

A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

Branislav Pecher, Ivan Srba, Maria Bielikova

TL;DR

This survey analyzes how randomness impairs stability when learning from limited labelled data through prompting, in-context learning, meta-learning, fine-tuning, and PEFT. It introduces a taxonomy of tasks (Investigate, Determine, Mitigate, Benchmark/Compare/Report) and randomness factors (input data, model/training, implementation/hardware, and systematic changes) and reviews 415 papers to synthesize findings. The review reveals pervasive and sometimes conflicting evidence of randomness-driven instability, with data order and sample choice often driving large effects, especially in LM-based and out-of-distribution settings. It highlights ensemble methods and targeted mitigations as common remedies, but stresses the need for better evaluation frameworks, multi-factor analyses, and randomness-aware benchmarks to advance the field meaningfully.

Abstract

Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variances in results across training runs. When such sensitivity is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 415 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate; determine; mitigate; benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received an appropriate level of attention, and reveal impactful directions for future research.

A Survey on Stability of Learning with Limited Labelled Data and its Sensitivity to the Effects of Randomness

TL;DR

This survey analyzes how randomness impairs stability when learning from limited labelled data through prompting, in-context learning, meta-learning, fine-tuning, and PEFT. It introduces a taxonomy of tasks (Investigate, Determine, Mitigate, Benchmark/Compare/Report) and randomness factors (input data, model/training, implementation/hardware, and systematic changes) and reviews 415 papers to synthesize findings. The review reveals pervasive and sometimes conflicting evidence of randomness-driven instability, with data order and sample choice often driving large effects, especially in LM-based and out-of-distribution settings. It highlights ensemble methods and targeted mitigations as common remedies, but stresses the need for better evaluation frameworks, multi-factor analyses, and randomness-aware benchmarks to advance the field meaningfully.

Abstract

Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variances in results across training runs. When such sensitivity is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 415 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate; determine; mitigate; benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received an appropriate level of attention, and reveal impactful directions for future research.
Paper Structure (56 sections, 4 figures, 2 tables)

This paper contains 56 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) The effects of randomness can be addressed in various depths in the papers. (b) If not taken into consideration, randomness can introduce bias into comparison results, causing one approach to show better results only due to random chance and unintentional cherry-picking.
  • Figure 2: Process for identifying and categorising papers for this survey. Due to strong clustering effects, additional papers are identified using a reference analysis on the most relevant identified papers, i.e., including papers cited in them, as well as the ones that cite them.
  • Figure 3: Number of papers dealing with randomness grouped by the year. Figure \ref{['fig:papers_through_years_core']} shows only the core papers that focus on addressing the effects of randomness in more detail, while Figure \ref{['fig:papers_through_years_all']} also includes the papers that only recognise the problem. The problem started to attract attention only in recent years. Year 2024 covers papers published until July 2024.
  • Figure 4: Different tasks for addressing the effects of randomness, along with their inputs, outputs and the relations between them. The dashed lines represent relations between the tasks that currently do not exist but need to be considered in the optimal state of addressing the effects of randomness, e.g., using more sophisticated evaluation when determining the effectiveness of mitigation strategies. The related tasks are grouped, e.g., evaluate task near the investigate task.