VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

Soumi Das; Shubhadip Nag; Shreyyash Sharma; Suparna Bhattacharya; Sourangshu Bhattacharya

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, Sourangshu Bhattacharya

TL;DR

This work proposes a controllable framework for data-centric trustworthy AI- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets, and proposes a novel online version of the Orthogonal Matching Pursuit algorithm for solving this problem.

Abstract

Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets. A key challenge in implementing an efficient DCTAI framework is to design an online value-function-based training data subset selection algorithm. We pose the training data valuation and subset selection problem as an online sparse approximation formulation. We propose a novel online version of the Orthogonal Matching Pursuit (OMP) algorithm for solving this problem. Experimental results show that VTruST outperforms the state-of-the-art baselines on social, image, and scientific datasets. We also show that the data values generated by VTruST can provide effective data-centric explanations for different trustworthiness metrics.

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

TL;DR

Abstract

Paper Structure (11 sections, 1 equation, 11 figures, 6 tables, 3 algorithms)

This paper contains 11 sections, 1 equation, 11 figures, 6 tables, 3 algorithms.

Introduction
VTruST : Value-driven Trustworthy AI through Selection of Training Data
A Controllable Value Function-based Framework for DCTAI
Value Functions for Trustworthy Data-centric AI
An online-OMP algorithm for online sparse approximation
Experimental Evaluation
Error rate, Fairness and Robustness on Social Data
Accuracy and Robustness on Image and Scientific Datasets
Data-centric analysis: Post hoc explanation
Discussion and Related works
Appendix

Figures (11)

Figure 1: Controlling tradeoffs in trustworthiness metrics for social data - Adult Census.
Figure 2: Box plot representation of CF-gap.
Figure 3: Anecdotal samples from VTruST-R & SSR with High Distinctiveness and Uncertainty from TinyImagenet for class Compass.
Figure 4: Varying fraction of subsets: We report the ER and disparities for different subset sizes selected by the proposed method VTruST-F and SSFR. It can be observed that the proposed method always stays below the baselines in terms of error rate and disparity measures.
Figure 5: Error Rate-Fairness and Robustness-Fairness tradeoff in clean and augmented data setup : We show the performance of the methods w.r.t the two dimensions - Performance and Disparity and can observe that the proposed method VTruST lies relatively on the bottom left region (low error rate or robust error rate-low disparity) with disparity being the lowest for $\lambda=0$. Higher weightage to $\lambda$ leads to a low error rate or robust error rate for the same fraction and increasing disparity.
...and 6 more figures

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

TL;DR

Abstract

VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

Authors

TL;DR

Abstract

Table of Contents

Figures (11)