Analysis of Distributed Algorithms for Big-data
Rajendra Purohit, K R Chowdhary, S D Purohit
TL;DR
The paper investigates scalable distributed and parallel processing for big-data using MapReduce architectures on open-source platforms such as Hadoop/HDFS. It contrasts traditional block-based processing with the MapReduce paradigm, detailing the map/reduce workflow, fault-tolerance, and programming model. Through word-frequency experiments across Hadoop MR, Spark, Hive, and OpenMP, it demonstrates how distributed frameworks achieve scalability while highlighting DFS overheads and near-linear performance in certain configurations. The findings emphasize the practicality of MapReduce for large-scale data tasks and provide insights into platform-specific performance trade-offs, supported by open-source tools. Overall, the work substantiates MapReduce as a robust approach for scalable, fault-tolerant big-data processing with clear implications for system design and deployment.
Abstract
The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on permanent basis. The present article focuses on the study and performance of distributed and parallel algorithms their file systems, to achieve scalability at local level (OpenMP platform), and at global level where computing and file systems are distributed. Various applications, algorithms,file systems have been used to demonstrate the areas, and their performance studies have been presented. The systems and applications chosen here are of open-source nature, due to their wider applicability.
