Keywords and Instances: A Hierarchical Contrastive Learning Framework Unifying Hybrid Granularities for Text Generation
Mingzhe Li, XieXiong Lin, Xiuying Chen, Jinxiong Chang, Qishen Zhang, Feng Wang, Taifeng Wang, Zhongyi Liu, Wei Chu, Dongyan Zhao, Rui Yan
TL;DR
This work tackles exposure bias and inadequate word-level guidance in text generation by introducing a hierarchical contrastive learning framework built on CVAE. It combines instance-level KL-based distribution alignment, a keyword-graph-driven keyword-level contrast, and a Mahalanobis inter-contrast that ties instance and keyword representations through a distribution-aware metric. The approach is instantiated with a keyword graph to polish keyword representations and an inter-level loss to mitigate contrast vanishing, yielding improvements across paraphrasing, dialogue, and storytelling tasks on QQP, Douban, and RocStories. Empirical results from automatic metrics and human judgments demonstrate the method’s effectiveness and robustness, with ablations confirming the necessity of each component. Overall, the paper presents a principled, distribution-aware, multi-granularity contrastive framework that enhances controllable text generation and semantic fidelity in multiple generation domains.
Abstract
Contrastive learning has achieved impressive success in generation tasks to militate the "exposure bias" problem and discriminatively exploit the different quality of references. Existing works mostly focus on contrastive learning on the instance-level without discriminating the contribution of each word, while keywords are the gist of the text and dominant the constrained mapping relationships. Hence, in this work, we propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text. Concretely, we first propose a keyword graph via contrastive correlations of positive-negative pairs to iteratively polish the keyword representations. Then, we construct intra-contrasts within instance-level and keyword-level, where we assume words are sampled nodes from a sentence distribution. Finally, to bridge the gap between independent contrast levels and tackle the common contrast vanishing problem, we propose an inter-contrast mechanism that measures the discrepancy between contrastive keyword nodes respectively to the instance distribution. Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
