Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
Sumit Mamtani, Abhijeet Bhure
TL;DR
This study evaluates how well frozen pretrained Transformer embeddings transfer to deception classification on the LIAR dataset, comparing encoder-only and decoder-only models with pooling/padding and neural versus non-neural heads. It finds that contextual embeddings, especially BERT, paired with simple logistic regression often outperform more complex neural classifiers, with pooling (not padding) generally providing robust document representations and only mild sensitivity to sequence length. The results emphasize the value of architecture-centric representations for veracity tasks and suggest that efficient, scalable deception detection can leverage high-quality frozen embeddings. Overall, the work clarifies practical configurations that balance performance and efficiency for real-world misinformation monitoring.
Abstract
This paper investigates fake news detection as a downstream evaluation of Transformer representations, benchmarking encoder-only and decoder-only pre-trained models (BERT, GPT-2, Transformer-XL) as frozen embedders paired with lightweight classifiers. Through controlled preprocessing comparing pooling versus padding and neural versus linear heads, results demonstrate that contextual self-attention encodings consistently transfer effectively. BERT embeddings combined with logistic regression outperform neural baselines on LIAR dataset splits, while analyses of sequence length and aggregation reveal robustness to truncation and advantages from simple max or average pooling. This work positions attention-based token encoders as robust, architecture-centric foundations for veracity tasks, isolating Transformer contributions from classifier complexity.
