Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement
Aaqib Saeed, Dimitris Spathis, Jungwoo Oh, Edward Choi, Ali Etemad
TL;DR
The paper tackles label noise in wearable time-series for health sensing. It proposes Few-shot Human-in-the-Loop Refinement (FHLR), a three-stage approach that seeds a model with weak (smoothed) labels, refines it with a small set of expert-labeled examples, and merges the seed and refined models via weighted parameter averaging. Across four health-related tasks, FHLR delivers significant improvements over eight baselines, including robust performance under both symmetric and asymmetric label noise, and demonstrates that simple parameter averaging can rival more complex ensembles. The method requires only a small amount of clean data and does not assume specific noise distributions, offering a practical, scalable solution for robust wearable health monitoring.
Abstract
Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the users and usually require rich metadata. As a result, label noise can become an increasingly thorny issue when labeling such data. In this paper, we propose a novel solution to address noisy label learning, entitled Few-Shot Human-in-the-Loop Refinement (FHLR). Our method initially learns a seed model using weak labels. Next, it fine-tunes the seed model using a handful of expert corrections. Finally, it achieves better generalizability and robustness by merging the seed and fine-tuned models via weighted parameter averaging. We evaluate our approach on four challenging tasks and datasets, and compare it against eight competitive baselines designed to deal with noisy labels. We show that FHLR achieves significantly better performance when learning from noisy labels and achieves state-of-the-art by a large margin, with up to 19% accuracy improvement under symmetric and asymmetric noise. Notably, we find that FHLR is particularly robust to increased label noise, unlike prior works that suffer from severe performance degradation. Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models.
