Table of Contents
Fetching ...

What's New in My Data? Novelty Exploration via Contrastive Generation

Masaru Isonuma, Ivan Titov

TL;DR

This study introduces the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties, and introduces an iterative version of CGE, which relies on a pre-trained model and the same model after fine-tuning.

Abstract

Fine-tuning is widely used to adapt language models for specific goals, often leveraging real-world data such as patient records, customer-service interactions, or web content in languages not covered in pre-training. These datasets are typically massive, noisy, and often confidential, making their direct inspection challenging. However, understanding them is essential for guiding model deployment and informing decisions about data cleaning or suppressing any harmful behaviors learned during fine-tuning. In this study, we introduce the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties. Our approach, Contrastive Generative Exploration (CGE), assumes no direct access to the data but instead relies on a pre-trained model and the same model after fine-tuning. By contrasting the predictions of these two models, CGE can generate examples that highlight novel characteristics of the fine-tuning data. However, this simple approach may produce examples that are too similar to one another, failing to capture the full range of novel phenomena present in the dataset. We address this by introducing an iterative version of CGE, where the previously generated examples are used to update the pre-trained model, and this updated model is then contrasted with the fully fine-tuned model to generate the next example, promoting diversity in the generated outputs. Our experiments demonstrate the effectiveness of CGE in detecting novel content, such as toxic language, as well as new natural and programming languages. Furthermore, we show that CGE remains effective even when models are fine-tuned using differential privacy techniques.

What's New in My Data? Novelty Exploration via Contrastive Generation

TL;DR

This study introduces the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties, and introduces an iterative version of CGE, which relies on a pre-trained model and the same model after fine-tuning.

Abstract

Fine-tuning is widely used to adapt language models for specific goals, often leveraging real-world data such as patient records, customer-service interactions, or web content in languages not covered in pre-training. These datasets are typically massive, noisy, and often confidential, making their direct inspection challenging. However, understanding them is essential for guiding model deployment and informing decisions about data cleaning or suppressing any harmful behaviors learned during fine-tuning. In this study, we introduce the task of novelty discovery through generation, which aims to identify novel properties of a fine-tuning dataset by generating examples that illustrate these properties. Our approach, Contrastive Generative Exploration (CGE), assumes no direct access to the data but instead relies on a pre-trained model and the same model after fine-tuning. By contrasting the predictions of these two models, CGE can generate examples that highlight novel characteristics of the fine-tuning data. However, this simple approach may produce examples that are too similar to one another, failing to capture the full range of novel phenomena present in the dataset. We address this by introducing an iterative version of CGE, where the previously generated examples are used to update the pre-trained model, and this updated model is then contrasted with the fully fine-tuned model to generate the next example, promoting diversity in the generated outputs. Our experiments demonstrate the effectiveness of CGE in detecting novel content, such as toxic language, as well as new natural and programming languages. Furthermore, we show that CGE remains effective even when models are fine-tuned using differential privacy techniques.

Paper Structure

This paper contains 35 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Outline of Contrastive Generative Exploration (CGE). Consider a model pre-trained on English text and then fine-tuned on a multilingual corpus, where a small portion of the data consists of non-English text. CGE calculates the difference in the log probabilities between the pre-trained and fine-tuned models. This allows for generating examples that represent novel properties of the fine-tuning dataset. Optionally, we can employ an iterative version of CGE, which iteratively trains the pre-trained model on the previously generated example, which is then contrasted with the fully fine-tuned model to generate the next example. This prevents the generation of examples similar to those already produced, thereby enhancing the diversity of the generated outputs.
  • Figure 2: Change in the detection and coverage rate across the different number of generated examples for the non-English dataset of OpenLLaMA. The line represents the average across four runs, and the shaded area corresponds to 95% confidence region.
  • Figure 3: Change in the detection and coverage rate across different values of noise multiplier. The line denotes the average across four runs, and the shaded area corresponds to 95% confidence region.