Table of Contents
Fetching ...

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, Sourangshu Bhattacharya

TL;DR

This work proposes a controllable framework for data-centric trustworthy AI- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets, and proposes a novel online version of the Orthogonal Matching Pursuit algorithm for solving this problem.

Abstract

Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets. A key challenge in implementing an efficient DCTAI framework is to design an online value-function-based training data subset selection algorithm. We pose the training data valuation and subset selection problem as an online sparse approximation formulation. We propose a novel online version of the Orthogonal Matching Pursuit (OMP) algorithm for solving this problem. Experimental results show that VTruST outperforms the state-of-the-art baselines on social, image, and scientific datasets. We also show that the data values generated by VTruST can provide effective data-centric explanations for different trustworthiness metrics.

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

TL;DR

This work proposes a controllable framework for data-centric trustworthy AI- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets, and proposes a novel online version of the Orthogonal Matching Pursuit algorithm for solving this problem.

Abstract

Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets. A key challenge in implementing an efficient DCTAI framework is to design an online value-function-based training data subset selection algorithm. We pose the training data valuation and subset selection problem as an online sparse approximation formulation. We propose a novel online version of the Orthogonal Matching Pursuit (OMP) algorithm for solving this problem. Experimental results show that VTruST outperforms the state-of-the-art baselines on social, image, and scientific datasets. We also show that the data values generated by VTruST can provide effective data-centric explanations for different trustworthiness metrics.
Paper Structure (11 sections, 1 equation, 11 figures, 6 tables, 3 algorithms)

This paper contains 11 sections, 1 equation, 11 figures, 6 tables, 3 algorithms.

Figures (11)

  • Figure 1: Controlling tradeoffs in trustworthiness metrics for social data - Adult Census.
  • Figure 2: Box plot representation of CF-gap.
  • Figure 3: Anecdotal samples from VTruST-R & SSR with High Distinctiveness and Uncertainty from TinyImagenet for class Compass.
  • Figure 4: Varying fraction of subsets: We report the ER and disparities for different subset sizes selected by the proposed method VTruST-F and SSFR. It can be observed that the proposed method always stays below the baselines in terms of error rate and disparity measures.
  • Figure 5: Error Rate-Fairness and Robustness-Fairness tradeoff in clean and augmented data setup : We show the performance of the methods w.r.t the two dimensions - Performance and Disparity and can observe that the proposed method VTruST lies relatively on the bottom left region (low error rate or robust error rate-low disparity) with disparity being the lowest for $\lambda=0$. Higher weightage to $\lambda$ leads to a low error rate or robust error rate for the same fraction and increasing disparity.
  • ...and 6 more figures