Evaluating Data Influence in Meta Learning

Chenyang Ren; Huanyi Xie; Shu Yang; Meng Ding; Lijie Hu; Di Wang

Evaluating Data Influence in Meta Learning

Chenyang Ren, Huanyi Xie, Shu Yang, Meng Ding, Lijie Hu, Di Wang

TL;DR

This work introduces a data attribution framework for meta learning under bilevel optimization by deriving task-IF and instance-IF to quantify how tasks and data points influence both meta-parameters and task-specific parameters. It formulates the total gradient and Hessian to capture direct and indirect effects across inner and outer levels and proposes acceleration techniques, including Neumann-series approximations and a Γ-based total Hessian surrogate, to scale to large models. Empirical results on Omniglot, MNIST, and Mini-Imagenet show that task-IF can achieve near-retraining accuracy with substantial runtime savings, while instance-IF enables efficient evaluation and editing of training data, including effective identification and removal of harmful data. The framework thus enhances data quality, interpretability, and efficiency in meta learning, with broad applicability to data cleaning, task selection, and model editing in complex BLO-based systems.

Abstract

As one of the most fundamental models, meta learning aims to effectively address few-shot learning challenges. However, it still faces significant issues related to the training data, such as training inefficiencies due to numerous low-contribution tasks in large datasets and substantial noise from incorrect labels. Thus, training data attribution methods are needed for meta learning. However, the dual-layer structure of mata learning complicates the modeling of training data contributions because of the interdependent influence between meta-parameters and task-specific parameters, making existing data influence evaluation tools inapplicable or inaccurate. To address these challenges, based on the influence function, we propose a general data attribution evaluation framework for meta-learning within the bilevel optimization framework. Our approach introduces task influence functions (task-IF) and instance influence functions (instance-IF) to accurately assess the impact of specific tasks and individual data points in closed forms. This framework comprehensively models data contributions across both the inner and outer training processes, capturing the direct effects of data points on meta-parameters as well as their indirect influence through task-specific parameters. We also provide several strategies to enhance computational efficiency and scalability. Experimental results demonstrate the framework's effectiveness in training data evaluation via several downstream tasks.

Evaluating Data Influence in Meta Learning

TL;DR

Abstract

Evaluating Data Influence in Meta Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (27)