NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

Sepanta Zeighami; Zac Wellmer; Aditya Parameswaran

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

Sepanta Zeighami, Zac Wellmer, Aditya Parameswaran

TL;DR

NUDGE is presented, a family of novel non-parametric embedding fine-tuning approaches that are significantly more accurate and efficient than both sets of existing approaches that are significantly more accurate and efficient than both sets of existing approaches.

Abstract

$k$-Nearest Neighbor search on dense vector embeddings ($k$-NN retrieval) from pre-trained embedding models is the predominant retrieval method for text and images, as well as Retrieval-Augmented Generation (RAG) pipelines. In practice, application developers often fine-tune the embeddings to improve their accuracy on the dataset and query workload in hand. Existing approaches either fine-tune the pre-trained model itself or, more efficiently, but at the cost of accuracy, train adaptor models to transform the output of the pre-trained model. We present NUDGE, a family of novel non-parametric embedding fine-tuning approaches that are significantly more accurate and efficient than both sets of existing approaches. NUDGE directly modifies the embeddings of data records to maximize the accuracy of $k$-NN retrieval. We present a thorough theoretical and experimental study of NUDGE's non-parametric approach. We show that even though the underlying problem is NP-Hard, constrained variations can be solved efficiently. These constraints additionally ensure that the changes to the embeddings are modest, avoiding large distortions to the semantics learned during pre-training. In experiments across five pre-trained models and nine standard text and image retrieval datasets, NUDGE runs in minutes and often improves NDCG@10 by more than 10% over existing fine-tuning methods. On average, NUDGE provides 3.3x and 4.3x higher increase in accuracy and runs 200x and 3x faster, respectively, over fine-tuning the pre-trained model and training adaptors.

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

TL;DR

Abstract

-Nearest Neighbor search on dense vector embeddings (

-NN retrieval) from pre-trained embedding models is the predominant retrieval method for text and images, as well as Retrieval-Augmented Generation (RAG) pipelines. In practice, application developers often fine-tune the embeddings to improve their accuracy on the dataset and query workload in hand. Existing approaches either fine-tune the pre-trained model itself or, more efficiently, but at the cost of accuracy, train adaptor models to transform the output of the pre-trained model. We present NUDGE, a family of novel non-parametric embedding fine-tuning approaches that are significantly more accurate and efficient than both sets of existing approaches. NUDGE directly modifies the embeddings of data records to maximize the accuracy of

-NN retrieval. We present a thorough theoretical and experimental study of NUDGE's non-parametric approach. We show that even though the underlying problem is NP-Hard, constrained variations can be solved efficiently. These constraints additionally ensure that the changes to the embeddings are modest, avoiding large distortions to the semantics learned during pre-training. In experiments across five pre-trained models and nine standard text and image retrieval datasets, NUDGE runs in minutes and often improves NDCG@10 by more than 10% over existing fine-tuning methods. On average, NUDGE provides 3.3x and 4.3x higher increase in accuracy and runs 200x and 3x faster, respectively, over fine-tuning the pre-trained model and training adaptors.

Paper Structure (30 sections, 7 theorems, 78 equations, 2 figures, 14 tables, 1 algorithm)

This paper contains 30 sections, 7 theorems, 78 equations, 2 figures, 14 tables, 1 algorithm.

Introduction
Preliminaries
Non-Parametric Embedding Fine-Tuning
Unconstrained Non-Parametric Embedding Fine-Tuning Problems
NUDGE Approaches
NUDGE-M: NUDGE with Bounded Magnitude
NUDGE-N: NUDGE with Normalized Embeddings
Experiments
Baseline Results
Out-of-distribution Generalization
Ablation Study
Related Work
Conclusion
Appendix Overview
Proofs
...and 15 more sections

Key Result

Theorem 1

MaxA-EFT is NP-Hard.

Figures (2)

Figure 1: NUDGEs change embeddings within a constrained region to maximize similarity with training queries. Data embeddings are colored based on queries for which they are the ground-truth answers.
Figure 2: Training and validation accuracy for BGE-S on three datasets

Theorems & Definitions (7)

Theorem 1
Theorem 2
Theorem 3
Lemma 1
Lemma 2
Lemma 3
Lemma 4

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

TL;DR

Abstract

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)