Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Young Hyun Yoo; Jii Cha; Changhyeon Kim; Taeuk Kim

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Young Hyun Yoo, Jii Cha, Changhyeon Kim, Taeuk Kim

TL;DR

Hyper-CL addresses the challenge of conditioning sentence representations on multiple perspectives without incurring the cost of cross- or bi-encoders. It uses a hypernetwork to generate condition-specific projection matrices, transforming precomputed sentence embeddings into condition-aware subspaces and optimizing with contrastive objectives in those subspaces. The approach narrows the performance gap to bi-encoders on C-STS and KGC while delivering substantial runtime and memory efficiency compared to traditional tri-encoders, aided by low-rank hypernetwork approximations and caching. Analyses show effective subspace clustering, strong generalization to unseen conditions, and ablations confirming the value of combining hypernetworks with contrastive learning for conditioned representations.

Abstract

While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations. In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions. Evaluation on two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time. We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms.

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

TL;DR

Abstract

Paper Structure (22 sections, 5 equations, 4 figures, 8 tables)

This paper contains 22 sections, 5 equations, 4 figures, 8 tables.

Introduction
Background and Related Work
Proposed Method: Hyper-CL
Motivation
Framework and Training Procedure
Contrastive Learning in Subspaces
C-STS
KGC
Optimization of Hypernetworks
Caching Conditioning Networks
Experiments
Conditional Semantic Textual Similarity
Knowledge Graph Completion
Analysis
Efficiency Comparison between Bi-Encoder and Tri-Encoder
...and 7 more sections

Figures (4)

Figure 1: Illustration of our approach dubbed Hyper-CL. In the example, two sentences are provided along with two distinct conditions, $c_{high}$ and $c_{low}$. Specifically, $c_{high}$ (orange) denotes a condition that results in the sentences being interpreted more similarly, whereas $c_{low}$ (blue) leads to a perspective in which the two sentences are understood as being relatively more distinct. The identical pair of sentences are projected into different subspaces that reflect the provided conditions.
Figure 2: Four different types of architectures applicable for conditioning tasks. They utilize the [CLS] token embeddings from the encoder as representations of inputs. From left to right: the cross-encoder architecture encodes a triplet containing two sentences ($s_1, s_2$) and a condition ($c$) as a whole. In the bi-encoder setting, two sentence-condition pairs ($s_1, c$) and ($s_2, c$) are processed individually. The tri-encoder configuration regard $s_1$, $s_2$, and $c$ as independent and encode them separately, followed by extra merging operations (e.g., Hadamard product). Finally, Hyper-CL resembles the tri-encoder, but innovatively incorporates a hypernetwork responsible for constructing projection matrices to condition sentences $s_1$ and $s_2$, based on the embedding of the condition $c$.
Figure 3: Training procedure of Hyper-CL. It introduces a hypernetwork $q$ to construct the weights of multi-layer perceptrons (MLPs), i.e., $g$, based on the condition. The MLPs are then used to project sentence embeddings onto subspaces, resulting in condition-aware sentence embeddings. Hyper-CL is trained with a contrastive objective, utilizing pairs of condition-aware sentence embeddings, one with a high condition $c_{high}$ and the other with a low condition $c_{low}$. Note that every embedding is the output of the same encoder $f$.
Figure 4: Visualization of the clusters of sentence embeddings before (top) and after (bottom) projection onto condition subspaces by Hyper-CL.

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

TL;DR

Abstract

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)