TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers

Wazib Ansar; Saptarsi Goswami; Amlan Chakrabarti

TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers

Wazib Ansar, Saptarsi Goswami, Amlan Chakrabarti

TL;DR

TexIm FAST addresses memory and cross-modal representation challenges by encoding variable-length text into fixed-length pictorial representations via a self-supervised CNN-TSLFN VAE. A dual-channel STS model processes these pictorial embeddings to evaluate semantic similarity, delivering robust performance with disparate-length sequences and substantial memory reduction. The approach achieves approximately a 6% gain in STS accuracy over baselines and supports oblivious, privacy-preserving inference suitable for constrained devices. This work opens avenues for cross-modal text analysis and suggests adaptive resolution and broader task applications in NLP and imaging domains.

Abstract

One of the principal objectives of Natural Language Processing (NLP) is to generate meaningful representations from text. Improving the informativeness of the representations has led to a tremendous rise in the dimensionality and the memory footprint. It leads to a cascading effect amplifying the complexity of the downstream model by increasing its parameters. The available techniques cannot be applied to cross-modal applications such as text-to-image. To ameliorate these issues, a novel Text-to-Image methodology for generating fixed-length representations through a self-supervised Variational Auto-Encoder (VAE) for semantic evaluation applying transformers (TexIm FAST) has been proposed in this paper. The pictorial representations allow oblivious inference while retaining the linguistic intricacies, and are potent in cross-modal applications. TexIm FAST deals with variable-length sequences and generates fixed-length representations with over 75% reduced memory footprint. It enhances the efficiency of the models for downstream tasks by reducing its parameters. The efficacy of TexIm FAST has been extensively analyzed for the task of Semantic Textual Similarity (STS) upon the MSRPC, CNN/ Daily Mail, and XSum data-sets. The results demonstrate 6% improvement in accuracy compared to the baseline and showcase its exceptional ability to compare disparate length sequences such as a text with its summary.

TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers

TL;DR

Abstract

Paper Structure (27 sections, 2 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 27 sections, 2 equations, 8 figures, 7 tables, 2 algorithms.

Introduction
Related Works
Cross-Modal Representation
Semantic Similarity
Proposed Methodology
TexIm FAST
Pre-processing
Tokenization
Input Embedding
Low-Dimensional Projection
Normalization and Feature Scaling
Reshaping and Quantization
Architecture for STS Determination
Experiment Details
Data-Sets
...and 12 more sections

Figures (8)

Figure 1: Illustration of the proposed TexIm FAST VAE architecture for projection of sequences to fixed-length vector
Figure 3: Illustration of the proposed TSLFN-based STS model
Figure 5: TexIm FAST representations corresponding to the first sequences for all the comparison objectives
Figure 6: TexIm FAST representations corresponding to the second sequences for all the comparison objectives
Figure 7: Comparison of the histograms of the TexIm FAST representations for all the comparison objectives
...and 3 more figures

TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers

TL;DR

Abstract

TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (8)