Demystifying Object-based Big Data Storage Systems

Anindita Sarkar Mondal; Madhupa Sanyal; Ari Kusumastuti; Hrishav Bakul Barua; Kartick Chandra Mondal

Demystifying Object-based Big Data Storage Systems

Anindita Sarkar Mondal, Madhupa Sanyal, Ari Kusumastuti, Hrishav Bakul Barua, Kartick Chandra Mondal

TL;DR

The paper addresses the growing challenge of storing and accessing massive, unstructured data by cataloging architectural approaches across five storage categories: DFS, CFS, Cloud Storage, Archive Storage, and OSS. It adopts a taxonomy-driven survey, detailing representative deployments from major providers and open-source projects, and emphasizes architectural primitives such as metadata vs. data separation, erasure coding, geo-caching, and policy-driven data placement. Key contributions include a comprehensive catalog of vendor implementations (e.g., S3, Atmos, ECS, Walrus, Scality RING, Cleversafe, WOS, Himalaya, StorageGRID, OpenStack Swift, Panasas, Lustre) and the articulation of mechanisms that drive scalability, durability, and multi-tenancy in object- and file-based storage systems. The findings illuminate how architectural choices shape performance, cost efficiency, data protection, and accessibility, offering practical guidance to storage consumers and developers designing systems for big data workloads.

Abstract

Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage architectures which provided by different renowned storage service providers. On an architectural basis, we divide the big data storage systems into five parts, Distributed file systems (DFS), Clustered File Systems (CFS), Cloud Storage, Archive Storage, and Object Storage Systems (OSS). Also, we reveal a detailed architectural view of the big data storage systems provided by the different organizations under these parts.

Demystifying Object-based Big Data Storage Systems

TL;DR

Abstract

Paper Structure (31 sections, 20 figures, 3 tables)

This paper contains 31 sections, 20 figures, 3 tables.

Introduction
Background Knowledge
Taxonomy of Big Data Storage System
Cloud Storage
Amazon Simple Storage System (S3)
Mezeo Cloud Storage
EMC Atmos
EMC Elastic Cloud Storage (ECS)
Eucalyptus Walrus Object Storage
Archive Storage
Hitachi Data System (HDS) - Hitachi Content Platform (HCP)
Storiant Object Storage
Object Storage System (OSS)
Scality RING Object Storage System
IBM Cleversafe Object Storage
...and 16 more sections

Figures (20)

Figure 1: Country-wise plot of the popularity of storage systems as retrieved from Google Trends. (The readers are requested to view the online version of this colored image.)
Figure 2: Organization of the article
Figure 5: Taxonomy of Big Data Storage System.
Figure 6: Operation of Amazon S3.
Figure 7: ECS Platform Services.
...and 15 more figures

Demystifying Object-based Big Data Storage Systems

TL;DR

Abstract

Demystifying Object-based Big Data Storage Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (20)