Triple-Encoders: Representations That Fire Together, Wire Together
Justus-Jonas Erker, Florian Mai, Nils Reimers, Gerasimos Spanakis, Iryna Gurevych
TL;DR
This work addresses the inefficiency of re-encoding entire dialog histories by introducing Contextualized Curved Contrastive Learning (C3L) via Triple-Encoders, which contextualizes independently encoded utterances through Hebbian-inspired co-occurrence without learnable weights, preserving linear inference complexity. By employing two before-spaces [B1] and [B2] and a mean-pooling fusion, the model learns distributed mixtures that better reflect sequential context than standard bi-encoders. Empirically, C3L yields substantial improvements over bi-encoders and zero-shot generalization on DailyDialog and PersonaChat, along with strong short-term planning performance, while maintaining efficient inference comparable to prior CC L approaches. The authors release code and models, underscoring the practical impact of self-organizing, context-aware representations for dialogue and potentially other sequential text tasks.
Abstract
Search-based dialog models typically re-encode the dialog history at every turn, incurring high cost. Curved Contrastive Learning, a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder, has recently shown promising results for dialog modeling at far superior efficiency. While high efficiency is achieved through independently encoding utterances, this ignores the importance of contextualization. To overcome this issue, this study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances through a novel hebbian inspired co-occurrence learning objective in a self-organizing manner, without using any weights, i.e., merely through local interactions. Empirically, we find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models without requiring re-encoding. Our code (https://github.com/UKPLab/acl2024-triple-encoders) and model (https://huggingface.co/UKPLab/triple-encoders-dailydialog) are publicly available.
