Inter-Processor Parallel Architecture (C2)

Description: This module provides an overview of different types of parallel architectures used in multiprocessor systems. Most of the coverage is devoted to shared-memory multiprocessors, distributed memory systems and multicore architectures. Design principles and programming models for each type of architecture is discussed. GPUs, Simultaneous Multithreading (SMT) and Vector Processing (SSE) are also covered briefly. The second part of the module focuses on models of parallelism and their appropriateness for different architectures. Models of data, task and pipelined parallelism is introduced through examples. The notions of message passing and shared-memory programming are also discussed in this context. The module ends with a discussion of some performance issues in parallel computing including Amdahls Law, scalability, and load balancing

Recommended Length: Two lectures (~1:15 min)

Recommended Course: Computer Architecture, Computer Organization

Topics and Learning outcomes (per NSF/IEEE-TCPP PDC Curriculum):

  • [Architecture] Taxonomy: Flynn's taxonomy, data vs. control parallelism, shared/distributed memory
  • [Architecture] Multicore: Describe how cores share resources (cache, memory) and resolve conflicts
  • [Architecture] Distributed memory: Know basic notions of messaging among processes, different ways of message passing, collective operations
  • [Architecture] SMP: Understand concept of uniform access shared memory architecture
  • [Architecture] Message passing: Shared memory architecture breaks down when scaled due to physical limitations (latency, bandwidth) and results in message passing architectures
  • [Architecture] Simultaneous Multi-Threading (SMT): Distinguish SMT from multicore (based on which resources are shared)
  • [Architecture] Heterogeneity (e.g., accelerators and GPUs): Recognize that multicore may not all be the same kind of core.
  • [Architecture] SIMD/Vector (e.g., SSE, Cray): Describe uses of SIMD/Vector (same operation on multiple data items), e.g., accelerating graphics for games.
  • [Programming] Computation decomposition strategies: Understand different ways to assign computations to threads or processes
  • [Programming] Load balancing: Understand the effects of load imbalances on performance, and ways to balance load across threads or processes
  • [Programming] Amdahls law: Know that speedup is limited by the sequential portion of a parallel program, if problem size is kept fixed
  • [Algorithm] Scalability in algorithms and architectures: Comprehend via several examples that having access more processors does not guarantee faster execution - the notion of inherent sequentiality.

Lecture Material: [ PDF ] [ PPT ]

Sample Source Code:

Pedagogical Notes: available for instructors only

Sample Exam Question: available for instructors only


Jun '15: Qasem speaks at HPC Workshop at Prairiw View A & M

Oct '14: Paper accepted at SIGCSE15

Oct '14: Short paper accepted at EduHPC14 (co-located with SC14)

Aug '14: First regional workshop held at Texas State

May '14: Call for participation in first regional workshop

Mar '14: Qasem serves as penelist in SIGCSE special session on PDC

Nov '13: Poster presented at Supercomputing conference

Sep '13: Paper accepted at EduPDHPC13

Aug '13: Qasem participates in CSinParallel Four Corners Workshop

Jul '13: Qasem receives Early Adopter grant

Mar '13: Qasem presents at NSF Showcase at SIGCSE13

Jan '13: Five new modules implemented

Aug '12: Burtscher receives Early Adopter grant


Apan Qasem (PI)
Department of Computer Science
Texas State University
601 University Dr
San Marcos, TX 78666

Office: Comal 307A
Phone: (512) 245-0347
Fax: (512) 245-8750
E-mail: apan "AT" txstate · edu