Bottleneck-based Encoder-decoder ARchitecture (BEAR) for Learning Unbiased Consumer-to-Consumer Image Representations
Pablo Rivas, Gisela Bichler, Tomas Cerny, Laurie Giddens, Stacie Petter
TL;DR
The paper addresses learning unbiased, privacy-preserving image representations for consumer-to-consumer (C2C) imagery to aid illicit-activity detection. It introduces BEAR, a bottleneck-based encoder–decoder autoencoder that combines ConvLSTM-based perceptual encoding, residual feature entanglement, and a multi-branch decoder to produce compact latent representations. Training on roughly 2 million 128×128 color images and evaluating on C2C, CIFAR-10, and ImageNet demonstrates convergent learning, meaningful latent clustering via k-means, and informative visualizations with UMAP, while preserving privacy by obscuring personal identifiers. The authors argue for a lightweight, less-label-biased alternative to transformer-based or contrastive models and propose future multimodal expansion with text and contrastive learning to build a trafficking-detection pipeline.
Abstract
Unbiased representation learning is still an object of study under specific applications and contexts. Novel architectures are usually crafted to resolve particular problems using mixtures of fundamental pieces. This paper presents different image feature extraction mechanisms that work together with residual connections to encode perceptual image information in an autoencoder configuration. We use image data that aims to support a larger research agenda dealing with issues regarding criminal activity in consumer-to-consumer online platforms. Preliminary results suggest that the proposed architecture can learn rich spaces using ours and other image datasets resolving important challenges that are identified.
