Table of Contents
Fetching ...

WeShap: Weak Supervision Source Evaluation with Shapley Values

Naiqing Guan, Nick Koudas

TL;DR

These findings indicate a noteworthy average improvement of 5.0 points in downstream model accuracy through the revision of the PWS pipeline compared to previous state-of-the-art methods, underscoring the efficacy of WeShap values in enhancing data quality for training machine learning models.

Abstract

Efficient data annotation stands as a significant bottleneck in training contemporary machine learning models. The Programmatic Weak Supervision (PWS) pipeline presents a solution by utilizing multiple weak supervision sources to automatically label data, thereby expediting the annotation process. Given the varied contributions of these weak supervision sources to the accuracy of PWS, it is imperative to employ a robust and efficient metric for their evaluation. This is crucial not only for understanding the behavior and performance of the PWS pipeline but also for facilitating corrective measures. In our study, we introduce WeShap values as an evaluation metric, which quantifies the average contribution of weak supervision sources within a proxy PWS pipeline, leveraging the theoretical underpinnings of Shapley values. We demonstrate efficient computation of WeShap values using dynamic programming, achieving quadratic computational complexity relative to the number of weak supervision sources. Our experiments demonstrate the versatility of WeShap values across various applications, including the identification of beneficial or detrimental labeling functions, refinement of the PWS pipeline, and rectification of mislabeled data. Furthermore, WeShap values aid in comprehending the behavior of the PWS pipeline and scrutinizing specific instances of mislabeled data. Although initially derived from a specific proxy PWS pipeline, we empirically demonstrate the generalizability of WeShap values to other PWS pipeline configurations. Our findings indicate a noteworthy average improvement of 5.0 points in downstream model accuracy through the revision of the PWS pipeline compared to previous state-of-the-art methods, underscoring the efficacy of WeShap values in enhancing data quality for training machine learning models.

WeShap: Weak Supervision Source Evaluation with Shapley Values

TL;DR

These findings indicate a noteworthy average improvement of 5.0 points in downstream model accuracy through the revision of the PWS pipeline compared to previous state-of-the-art methods, underscoring the efficacy of WeShap values in enhancing data quality for training machine learning models.

Abstract

Efficient data annotation stands as a significant bottleneck in training contemporary machine learning models. The Programmatic Weak Supervision (PWS) pipeline presents a solution by utilizing multiple weak supervision sources to automatically label data, thereby expediting the annotation process. Given the varied contributions of these weak supervision sources to the accuracy of PWS, it is imperative to employ a robust and efficient metric for their evaluation. This is crucial not only for understanding the behavior and performance of the PWS pipeline but also for facilitating corrective measures. In our study, we introduce WeShap values as an evaluation metric, which quantifies the average contribution of weak supervision sources within a proxy PWS pipeline, leveraging the theoretical underpinnings of Shapley values. We demonstrate efficient computation of WeShap values using dynamic programming, achieving quadratic computational complexity relative to the number of weak supervision sources. Our experiments demonstrate the versatility of WeShap values across various applications, including the identification of beneficial or detrimental labeling functions, refinement of the PWS pipeline, and rectification of mislabeled data. Furthermore, WeShap values aid in comprehending the behavior of the PWS pipeline and scrutinizing specific instances of mislabeled data. Although initially derived from a specific proxy PWS pipeline, we empirically demonstrate the generalizability of WeShap values to other PWS pipeline configurations. Our findings indicate a noteworthy average improvement of 5.0 points in downstream model accuracy through the revision of the PWS pipeline compared to previous state-of-the-art methods, underscoring the efficacy of WeShap values in enhancing data quality for training machine learning models.
Paper Structure (22 sections, 2 theorems, 16 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 16 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Consider a coalition game $\mathcal{G}$ where a majority voting (MV) model utilizes a set of LFs to label a data point $(x,y)$. The player set $I=\{1,...,m\}$ denotes the set of LFs, and the utility function $v'$ maps an LF coalition to the accuracy gain of the MV model using the LFs compared to ran

Figures (8)

  • Figure 1: A motivating example for comprehensive LF evaluation. LF 2 is essential for reducing classification errors, although its accuracy is around 0.5.
  • Figure 2: Illustration of the WeShap value computation.
  • Figure 3: Efficient computation of WeShap scores using dynamic programming. The target LF $\lambda$ is denoted in star; green and red cells represent LFs making correct and wrong predictions, respectively.
  • Figure 4: Average downstream model accuracy gain of WeShap compared to random baseline in ranking LFs.
  • Figure 5: Ranking LFs on selected datasets.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Definition 1: Proxy Framework
  • Definition 2: Proxy Game
  • Theorem 1
  • Example
  • Theorem 2
  • Example