ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation
Chen Huang, Yiping Jin, Ilija Ilievski, Wenqiang Lei, Jiancheng Lv
TL;DR
Araida tackles the limited-data data-annotation problem by integrating an annotation model with a KNN-based analogical reasoning reference. It introduces an error-aware integration strategy that weights the two sources dynamically per example, guided by neighborhood error signals and local density, and optimizes the components via a coordinated loss with differentiable KNN support. Across word- and sentence-level tasks, Araida consistently reduces required human corrections, achieving an average MCA improvement of 11.02% and demonstrating robustness across classic and LLM-based annotation models. The framework is modular and adaptable, improving annotation accuracy while remaining compatible with various datasets, models, and even active-learning setups, thereby enhancing practical annotation efficiency in resource-constrained settings.
Abstract
Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge, we propose Araida, an analogical reasoning-based approach that enhances automatic annotation accuracy in the interactive data annotation setting and reduces the need for human corrections. Araida involves an error-aware integration strategy that dynamically coordinates an annotation model and a k-nearest neighbors (KNN) model, giving more importance to KNN's predictions when predictions from the annotation model are deemed inaccurate. Empirical studies demonstrate that Araida is adaptable to different annotation tasks and models. On average, it reduces human correction labor by 11.02% compared to vanilla interactive data annotation methods.
