DataCube: A Video Retrieval Platform via Natural Language Semantic Profiling
Yiming Ju, Hanyu Zhao, Quanyue Ma, Donglin Hao, Chengwei Wu, Ming Li, Songjing Wang, Tengfei Pan
TL;DR
DataCube tackles the challenge of extracting task-specific content from massive video repositories by replacing a single global embedding with structured semantic profiles. It builds multi-dimensional natural-language representations of video clips using large vision-language models, then indexes them with Milvus to support both embedding-based retrieval and neural re-ranking, plus a deep retrieval path for complex queries. The platform integrates automatic preprocessing, quality control, and a coarse-to-fine retrieval pipeline guided by GPT-based query enrichment, delivered through an interactive web interface for constructing customized datasets. With support for public and private collections and deployment on scalable hardware, DataCube demonstrates practical scalability to hundreds of millions of clips and enables semantic profile reuse to lower data-preparation costs for video-centric research and applications.
Abstract
Large-scale video repositories are increasingly available for modern video understanding and generation tasks. However, transforming raw videos into high-quality, task-specific datasets remains costly and inefficient. We present DataCube, an intelligent platform for automatic video processing, multi-dimensional profiling, and query-driven retrieval. DataCube constructs structured semantic representations of video clips and supports hybrid retrieval with neural re-ranking and deep semantic matching. Through an interactive web interface, users can efficiently construct customized video subsets from massive repositories for training, analysis, and evaluation, and build searchable systems over their own private video collections. The system is publicly accessible at https://datacube.baai.ac.cn/. Demo Video: https://baai-data-cube.ks3-cn-beijing.ksyuncs.com/custom/Adobe%20Express%20-%202%E6%9C%8818%E6%97%A5%20%281%29%281%29%20%281%29.mp4
