Table of Contents
Fetching ...

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya Naresh Reganti, Aman Chadha

TL;DR

DPO-Kernels advances direct preference optimization by injecting kernelized representations and embedding-based semantics into the alignment objective. It broadens the divergence toolbox beyond KL with Jensen–Shannon, Hellinger, Wasserstein, and others, and introduces data-driven selection to pick kernel-divergence pairs, plus a Hierarchical Mixture of Kernels to balance local and global structure. Empirically, it achieves state-of-the-art generalization across 12 diverse datasets for factuality, safety, reasoning, and instruction following, while grounding analysis in Heavy-Tailed Self-Regularization. The framework emphasizes robustness, interpretability, and potential multimodal extensions, albeit with higher computational costs and notable ethical considerations that warrant careful mitigation.

Abstract

The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local precision and global modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance in factuality, safety, reasoning, and instruction following. Grounded in Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization for LLMs, offering a comprehensive resource for further alignment research.

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

TL;DR

DPO-Kernels advances direct preference optimization by injecting kernelized representations and embedding-based semantics into the alignment objective. It broadens the divergence toolbox beyond KL with Jensen–Shannon, Hellinger, Wasserstein, and others, and introduces data-driven selection to pick kernel-divergence pairs, plus a Hierarchical Mixture of Kernels to balance local and global structure. Empirically, it achieves state-of-the-art generalization across 12 diverse datasets for factuality, safety, reasoning, and instruction following, while grounding analysis in Heavy-Tailed Self-Regularization. The framework emphasizes robustness, interpretability, and potential multimodal extensions, albeit with higher computational costs and notable ethical considerations that warrant careful mitigation.

Abstract

The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local precision and global modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance in factuality, safety, reasoning, and instruction following. Grounded in Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization for LLMs, offering a comprehensive resource for further alignment research.
Paper Structure (188 sections, 184 equations, 26 figures, 14 tables)

This paper contains 188 sections, 184 equations, 26 figures, 14 tables.

Figures (26)

  • Figure 1: Kernel methods are techniques in machine learning that allow us to implicitly map input data into a higher-dimensional feature space without explicitly performing the transformation. This is achieved through kernels, which are functions that compute the inner product of two data points in the transformed feature space. For better intution on gradient descent dynamics on kernel-induced loss landscapes cf. \ref{['sec:appendix:loss_landscape']}.
  • Figure 2: The plot illustrates the oscillatory behavior and trends of various divergence measures, including Wasserstein, Jensen-Shannon, Hellinger, Rényi, Bhattacharyya, and f-divergence, as the training progresses, reflecting their sensitivity to the evolving alignment dynamics.
  • Figure 3: Visualization of the four proposed metrics for kernel selection in alignment tasks. (a) Positive-Negative Divergence (PND) illustrates the divergence between alignment scores for positive and negative samples, indicating the degree of separability. (b) Positive-Negative Alignment Variance (PNAV) depicts the variance in alignment scores for positive and negative samples, reflecting alignment consistency. (c) Triplet Alignment Tightness (TAT) shows the relative positioning of query ($x$), positive ($y^+$), and negative ($y^-$) embeddings in the latent space, highlighting alignment precision. (d) Normalized Alignment Gap (NAG) tracks the evolution of alignment gaps over samples, where smaller NAG values signify better alignment quality. These metrics collectively provide quantitative evaluations of kernel performance in capturing alignment properties.
  • Figure 4: Visualization of the four key metrics for divergence selection: (1) Support Overlap — Heatmap representing the overlap between two distributions, highlighting shared support regions; (2) Drift Magnitude — Illustration of the shift in the mean of a distribution over time, showcasing how drift is detected; (3) Kurtosis — Bar plot comparing kurtosis values for normal, heavy-tailed, and light-tailed distributions, quantifying the "tailedness" of each distribution; (4) Smoothness — Visualization of a smooth function and its derivative, where smoother functions exhibit smaller, less abrupt changes in derivatives. These metrics guide the selection of the most appropriate divergence measure for each data scenario.
  • Figure 5: Evolution of Kernel Weights in the Mixture Over 200 Epochs. The plot illustrates the dynamic adjustment of kernel weights ($\lambda_1$, $\lambda_2$, $\lambda_3$, $\lambda_4$) corresponding to Polynomial, RBF, Spectral, and Mahalanobis kernels, respectively, during training. Each curve represents the relative contribution of a kernel, showing how the model adapts its alignment strategy over time. The dominance of one or two kernels, as indicated by the curves, highlights the tendency towards kernel collapse, where certain kernels overshadow others. This visualization underscores the challenges in maintaining kernel diversity within the mixture.
  • ...and 21 more figures