Table of Contents
Fetching ...

What Every Computer Scientist Needs To Know About Parallelization

Temitayo Adefemi

TL;DR

The paper surveys the theory and practice of parallelization, tracing the evolution from PRAM through BSP and LogP to multicore and heterogeneous systems, and contrasting models with practical speedup limits via Amdahl’s and Gustafson’s laws. It maps core paradigms (processes, threads, and the Actor model) and patterns (geometric decomposition, pipelines, recursive data), emphasizing how problem characteristics, size, and hardware shape performance. A central case study on a road traffic simulation demonstrates pattern choices, synchronization, memory considerations, and language-hardware trade-offs, highlighting MPI-based implementations and language performance gaps. The work emphasizes tooling, scalability evaluation, and deployment practices, offering concrete guidance for designing, optimizing, and validating parallel applications across diverse architectures. Overall, it links theory to practice to equip computer scientists with the skills to design robust, scalable, and efficient parallel software in an increasingly concurrent landscape.

Abstract

Parallelization has become a cornerstone of modern computing, influencing everything from high performance supercomputers to everyday mobile devices. This paper presents a comprehensive guide on the fundamentals of parallelization that every computer scientist should know, beginning with a historical perspective that traces the evolution from early theoretical models such as PRAM and BSP to today's advanced multicore and heterogeneous architectures. We explore essential theoretical frameworks, practical paradigms, and synchronization mechanisms while discussing implementation strategies using processes, threads, and modern models like the Actor framework. Additionally, we examine how hardware components including CPUs, caches, memory, and accelerators interact with software to impact performance, scalability, and load balancing. This work demystifies parallel programming by integrating historical context, theoretical underpinnings, and practical case studies. It equips readers with the tools to design, optimize, and troubleshoot parallel applications in an increasingly concurrent computing landscape.

What Every Computer Scientist Needs To Know About Parallelization

TL;DR

The paper surveys the theory and practice of parallelization, tracing the evolution from PRAM through BSP and LogP to multicore and heterogeneous systems, and contrasting models with practical speedup limits via Amdahl’s and Gustafson’s laws. It maps core paradigms (processes, threads, and the Actor model) and patterns (geometric decomposition, pipelines, recursive data), emphasizing how problem characteristics, size, and hardware shape performance. A central case study on a road traffic simulation demonstrates pattern choices, synchronization, memory considerations, and language-hardware trade-offs, highlighting MPI-based implementations and language performance gaps. The work emphasizes tooling, scalability evaluation, and deployment practices, offering concrete guidance for designing, optimizing, and validating parallel applications across diverse architectures. Overall, it links theory to practice to equip computer scientists with the skills to design robust, scalable, and efficient parallel software in an increasingly concurrent landscape.

Abstract

Parallelization has become a cornerstone of modern computing, influencing everything from high performance supercomputers to everyday mobile devices. This paper presents a comprehensive guide on the fundamentals of parallelization that every computer scientist should know, beginning with a historical perspective that traces the evolution from early theoretical models such as PRAM and BSP to today's advanced multicore and heterogeneous architectures. We explore essential theoretical frameworks, practical paradigms, and synchronization mechanisms while discussing implementation strategies using processes, threads, and modern models like the Actor framework. Additionally, we examine how hardware components including CPUs, caches, memory, and accelerators interact with software to impact performance, scalability, and load balancing. This work demystifies parallel programming by integrating historical context, theoretical underpinnings, and practical case studies. It equips readers with the tools to design, optimize, and troubleshoot parallel applications in an increasingly concurrent computing landscape.

Paper Structure

This paper contains 71 sections, 3 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: A schematic showing how a problem is broken down into instructions and executed by a processor. An example function process_data() is also shown.
  • Figure 2: Parallel execution of process_data() for multiple inputs. Each input (e.g., emp1) is processed separately in parallel.
  • Figure 3: Processes in an Operating System. Each process runs in its own isolated memory space and maintains separate resources.
  • Figure 4: Threads in an Operating System. Threads share the process's memory (code, data, heap) while each maintains its own stack for execution context.
  • Figure 5: Scaling Results of a Cellular Automaton parallelised using Message Passing Interface
  • ...and 7 more figures