Leveraging OpenFlamingo for Multimodal Embedding Analysis of C2C Car Parts Data
Maisha Binte Rashid, Pablo Rivas
TL;DR
This study probes OpenFlamingo's ability to process large-scale multimodal data by analyzing 1.2 million C2C car-parts posts with images from OfferUp and Craigslist. It uses OpenFlamingo to generate joint text-image embeddings and applies $k$-means clustering into $20$ clusters, with visualization via UMAP on $70{,}000$ samples per dataset. The results show that most clusters capture coherent patterns (e.g., tires, lights, body parts), but several clusters lack clear structure, suggesting limitations when posts contain multiple images. The findings demonstrate the potential of scalable multimodal analysis for heterogeneous online data and highlight architectural improvements needed to better handle multi-image posts in real-world marketplaces.
Abstract
In this paper, we aim to investigate the capabilities of multimodal machine learning models, particularly the OpenFlamingo model, in processing a large-scale dataset of consumer-to-consumer (C2C) online posts related to car parts. We have collected data from two platforms, OfferUp and Craigslist, resulting in a dataset of over 1.2 million posts with their corresponding images. The OpenFlamingo model was used to extract embeddings for the text and image of each post. We used $k$-means clustering on the joint embeddings to identify underlying patterns and commonalities among the posts. We have found that most clusters contain a pattern, but some clusters showed no internal patterns. The results provide insight into the fact that OpenFlamingo can be used for finding patterns in large datasets but needs some modification in the architecture according to the dataset.
