ActiveDP: Bridging Active Learning and Data Programming

Naiqing Guan; Nick Koudas

ActiveDP: Bridging Active Learning and Data Programming

Naiqing Guan, Nick Koudas

TL;DR

ActiveDP presents a novel interactive framework that bridges data programming and active learning to produce labels with both high accuracy and broad coverage. It introduces the ADP sampler for balanced query selection, LabelPick for efficient LF pruning via a Markov Blanket approach, and ConFusion for confidence-based label aggregation that leverages both DP and AL signals. Empirical results on textual and tabular datasets show ActiveDP consistently outperforms state-of-the-art weak supervision and active learning baselines across diverse labeling budgets, with robust performance under label noise. The work highlights the practical value of combining weak supervision with instance-level labeling and demonstrates scalable improvements for downstream classifiers in real-world labeling scenarios.

Abstract

Modern machine learning models require large labelled datasets to achieve good performance, but manually labelling large datasets is expensive and time-consuming. The data programming paradigm enables users to label large datasets efficiently but produces noisy labels, which deteriorates the downstream model's performance. The active learning paradigm, on the other hand, can acquire accurate labels but only for a small fraction of instances. In this paper, we propose ActiveDP, an interactive framework bridging active learning and data programming together to generate labels with both high accuracy and coverage, combining the strengths of both paradigms. Experiments show that ActiveDP outperforms previous weak supervision and active learning approaches and consistently performs well under different labelling budgets.

ActiveDP: Bridging Active Learning and Data Programming

TL;DR

Abstract

Paper Structure (21 sections, 3 equations, 3 figures, 5 tables)

This paper contains 21 sections, 3 equations, 3 figures, 5 tables.

Introduction
Related Works
Data Programming
Active Learning
ActiveDP Framework
Framework Overview
Label Aggregation
Sample Selection Strategy
Label Function Selection
Experiments
Experiment Setup
Datasets.
Baselines.
Evaluation Protocol.
Simulated User.
...and 6 more sections

Figures (3)

Figure 1: Workflow of ActiveDP. Left: iterative LF creation at training phase. Right: label aggregation at inference phase.
Figure 2: Workflow of the LF selection module in ActiveDP.
Figure 3: End-to-end Performance comparison between ActiveDP and Baseline Methods.

ActiveDP: Bridging Active Learning and Data Programming

TL;DR

Abstract

ActiveDP: Bridging Active Learning and Data Programming

Authors

TL;DR

Abstract

Table of Contents

Figures (3)