Demystifying Object-based Big Data Storage Systems
Anindita Sarkar Mondal, Madhupa Sanyal, Ari Kusumastuti, Hrishav Bakul Barua, Kartick Chandra Mondal
TL;DR
The paper addresses the growing challenge of storing and accessing massive, unstructured data by cataloging architectural approaches across five storage categories: DFS, CFS, Cloud Storage, Archive Storage, and OSS. It adopts a taxonomy-driven survey, detailing representative deployments from major providers and open-source projects, and emphasizes architectural primitives such as metadata vs. data separation, erasure coding, geo-caching, and policy-driven data placement. Key contributions include a comprehensive catalog of vendor implementations (e.g., S3, Atmos, ECS, Walrus, Scality RING, Cleversafe, WOS, Himalaya, StorageGRID, OpenStack Swift, Panasas, Lustre) and the articulation of mechanisms that drive scalability, durability, and multi-tenancy in object- and file-based storage systems. The findings illuminate how architectural choices shape performance, cost efficiency, data protection, and accessibility, offering practical guidance to storage consumers and developers designing systems for big data workloads.
Abstract
Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage architectures which provided by different renowned storage service providers. On an architectural basis, we divide the big data storage systems into five parts, Distributed file systems (DFS), Clustered File Systems (CFS), Cloud Storage, Archive Storage, and Object Storage Systems (OSS). Also, we reveal a detailed architectural view of the big data storage systems provided by the different organizations under these parts.
