Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Yoori Oh; Yoseob Han; Kyogu Lee

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Yoori Oh, Yoseob Han, Kyogu Lee

TL;DR

Audio-language retrieval suffers from limited data and many-to-one caption mappings, hindering robust cross-modal learning. The paper proposes a distance-sampling-based paraphraser that uses ChatGPT with few-shot prompting to generate distance-controlled paraphrases, guided by a distance d defined as the inverse of the Jaccard similarity between token sets of the ground-truth and paraphrase candidates. The method employs a three-stage pipeline: distance calculation and clustering, few-shot prompt sampling, and distance-constrained text manipulation to produce manipulated sentences. On AudioCaps, the approach yields competitive improvements in recall at rank k over baselines, with ablations showing favorable trade-offs between distance and the number of few-shot examples; the paraphraser can behave as an interpolator or extrapolator depending on distance. Overall, this scalable text augmentation framework enhances audio-language retrieval and informs prompt design for multimodal data generation.

Abstract

There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 2 figures, 3 tables)

This paper contains 15 sections, 2 equations, 2 figures, 3 tables.

Introduction
Related Works
Audio-Language representation learning
Multimodal learning with ChatGPT
Method
Audio-Text contrastive learning
Distance Sampling-based paraphraser
$1^{st}$ stage - Distance calculation for example clustering
$2^{nd}$ stage - Few-shot prompting examples of ChatGPT
$3^{rd}$ stage - Text manipulation with distance constraints
Experiments
Experimental setup
Model performance
Ablation study
Conclusion

Figures (2)

Figure 1: Overview of proposed distance sampling-based paraphraser. (a) Original audio-text dataset pairs, (b) distance sampling-based paraphraser to manipulate text data, and (c) new audio-text dataset pairs satisfying an unique mapping.
Figure 2: Three-stages pipeline of the proposed Distance Sampling-based paraphraser. (a) $1^{st}$ stage to calcuate a distance between ground truth sentence and candidate sentences, (b) $2^{nd}$ stage to perform few-shot prompting for ChatGPT using the examples clustered by the distance, and (c) $3^{rd}$ stage to generate the manipulated text satisfying the given distance.

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

TL;DR

Abstract

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)