Parallel Performance: Analysis and Evaluation (E2)
Description: This module concentrates on performance estimation and measurement of parallel systems, including efficiency, linear and super-linear speedup, throughput, data locality, weak and strong scalability, and load balance. Performance estimation of sequential architectures and the implications of Amdahls law are typically part of current computer architecture courses. This module extends these concepts and investigates parallel performance in light of Amdahls law. It explores modern parallel benchmark suites such as PARSEC (task, data, and pipelined parallelism) and Lonestar (amorphous parallelism) and demonstrates how to write benchmark programs to measure the performance of parallel hardware (e.g., the communication latency at various levels). It discusses how to identify the potential for speedup as well as upper speedup bounds and performance obstacles.
Recommended Length: One lecture (~1:15 min)
Recommended Course: Compilers, Computer Architecture, Upper-level CS elective
Topics and Learning outcomes (per NSF/IEEE-TCPP PDC Curriculum):
- [Programming] Gustafsons Law: understand the idea of weak scaling, where problem size increases as the number of processes/threads increases
- [Architecture] Latency: know the concept, implications for scaling, impact on work/communication ratio to achieve speedup
- [Architecture] Bandwidth: know the concept, how it limits sharing, and considerations of data movement cost
- [Algorithm] Cost reduction: be exposed to a variety of computational costs other than time that can benefit from parallelism (a more advanced extension of speedup)
- [Algorithm] Speedup: recognize the use of parallelism either to solve a given problem instance faster or to solve larger instance in the same time (strong and weak scaling)
- [Programming] Program transformations: be able to perform simple loop transformations by hand, and understand how that impacts performance of the resulting code (e.g., loop fusion, fission, skewing)
- [Architecture] Peak performance: understanding peak performance, how it is rarely valid for estimating real performance, illustrate fallacies
- [Architecture] Sustained performance: know difference between peak and sustained performance, how to define, measure, different benchmarks
Lecture Material: [ PDF ] [ PPT ]
Sample Source Code:
- OpenMP microbenchmarks illustrating performance interaction between thread affinity and data locality: [ tar.gz ]
Pedagogical Notes: available for instructors only
Sample Exam Question: available for instructors only