Table of Contents
Fetching ...

Demystifying Object-based Big Data Storage Systems

Anindita Sarkar Mondal, Madhupa Sanyal, Ari Kusumastuti, Hrishav Bakul Barua, Kartick Chandra Mondal

TL;DR

The paper addresses the growing challenge of storing and accessing massive, unstructured data by cataloging architectural approaches across five storage categories: DFS, CFS, Cloud Storage, Archive Storage, and OSS. It adopts a taxonomy-driven survey, detailing representative deployments from major providers and open-source projects, and emphasizes architectural primitives such as metadata vs. data separation, erasure coding, geo-caching, and policy-driven data placement. Key contributions include a comprehensive catalog of vendor implementations (e.g., S3, Atmos, ECS, Walrus, Scality RING, Cleversafe, WOS, Himalaya, StorageGRID, OpenStack Swift, Panasas, Lustre) and the articulation of mechanisms that drive scalability, durability, and multi-tenancy in object- and file-based storage systems. The findings illuminate how architectural choices shape performance, cost efficiency, data protection, and accessibility, offering practical guidance to storage consumers and developers designing systems for big data workloads.

Abstract

Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage architectures which provided by different renowned storage service providers. On an architectural basis, we divide the big data storage systems into five parts, Distributed file systems (DFS), Clustered File Systems (CFS), Cloud Storage, Archive Storage, and Object Storage Systems (OSS). Also, we reveal a detailed architectural view of the big data storage systems provided by the different organizations under these parts.

Demystifying Object-based Big Data Storage Systems

TL;DR

The paper addresses the growing challenge of storing and accessing massive, unstructured data by cataloging architectural approaches across five storage categories: DFS, CFS, Cloud Storage, Archive Storage, and OSS. It adopts a taxonomy-driven survey, detailing representative deployments from major providers and open-source projects, and emphasizes architectural primitives such as metadata vs. data separation, erasure coding, geo-caching, and policy-driven data placement. Key contributions include a comprehensive catalog of vendor implementations (e.g., S3, Atmos, ECS, Walrus, Scality RING, Cleversafe, WOS, Himalaya, StorageGRID, OpenStack Swift, Panasas, Lustre) and the articulation of mechanisms that drive scalability, durability, and multi-tenancy in object- and file-based storage systems. The findings illuminate how architectural choices shape performance, cost efficiency, data protection, and accessibility, offering practical guidance to storage consumers and developers designing systems for big data workloads.

Abstract

Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage architectures which provided by different renowned storage service providers. On an architectural basis, we divide the big data storage systems into five parts, Distributed file systems (DFS), Clustered File Systems (CFS), Cloud Storage, Archive Storage, and Object Storage Systems (OSS). Also, we reveal a detailed architectural view of the big data storage systems provided by the different organizations under these parts.
Paper Structure (31 sections, 20 figures, 3 tables)

This paper contains 31 sections, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Country-wise plot of the popularity of storage systems as retrieved from Google Trends. (The readers are requested to view the online version of this colored image.)
  • Figure 2: Organization of the article
  • Figure 5: Taxonomy of Big Data Storage System.
  • Figure 6: Operation of Amazon S3.
  • Figure 7: ECS Platform Services.
  • ...and 15 more figures