Parallel Performance: Analysis and Evaluation (E2)

Description: This module concentrates on performance estimation and measurement of parallel systems, including efficiency, linear and super-linear speedup, throughput, data locality, weak and strong scalability, and load balance. Performance estimation of sequential architectures and the implications of Amdahls law are typically part of current computer architecture courses. This module extends these concepts and investigates parallel performance in light of Amdahls law. It explores modern parallel benchmark suites such as PARSEC (task, data, and pipelined parallelism) and Lonestar (amorphous parallelism) and demonstrates how to write benchmark programs to measure the performance of parallel hardware (e.g., the communication latency at various levels). It discusses how to identify the potential for speedup as well as upper speedup bounds and performance obstacles.

Recommended Length: One lecture (~1:15 min)

Recommended Course: Compilers, Computer Architecture, Upper-level CS elective

Topics and Learning outcomes (per NSF/IEEE-TCPP PDC Curriculum):

  • [Programming] Gustafsons Law: understand the idea of weak scaling, where problem size increases as the number of processes/threads increases
  • [Architecture] Latency: know the concept, implications for scaling, impact on work/communication ratio to achieve speedup
  • [Architecture] Bandwidth: know the concept, how it limits sharing, and considerations of data movement cost
  • [Algorithm] Cost reduction: be exposed to a variety of computational costs other than time that can benefit from parallelism (a more advanced extension of speedup)
  • [Algorithm] Speedup: recognize the use of parallelism either to solve a given problem instance faster or to solve larger instance in the same time (strong and weak scaling)
  • [Programming] Program transformations: be able to perform simple loop transformations by hand, and understand how that impacts performance of the resulting code (e.g., loop fusion, fission, skewing)
  • [Architecture] Peak performance: understanding peak performance, how it is rarely valid for estimating real performance, illustrate fallacies
  • [Architecture] Sustained performance: know difference between peak and sustained performance, how to define, measure, different benchmarks

Lecture Material: [ PDF ] [ PPT ]

Sample Source Code:

  • OpenMP microbenchmarks illustrating performance interaction between thread affinity and data locality: [ tar.gz ]

Pedagogical Notes: available for instructors only

Sample Exam Question: available for instructors only


Jun '15: Qasem speaks at HPC Workshop at Prairiw View A & M

Oct '14: Paper accepted at SIGCSE15

Oct '14: Short paper accepted at EduHPC14 (co-located with SC14)

Aug '14: First regional workshop held at Texas State

May '14: Call for participation in first regional workshop

Mar '14: Qasem serves as penelist in SIGCSE special session on PDC

Nov '13: Poster presented at Supercomputing conference

Sep '13: Paper accepted at EduPDHPC13

Aug '13: Qasem participates in CSinParallel Four Corners Workshop

Jul '13: Qasem receives Early Adopter grant

Mar '13: Qasem presents at NSF Showcase at SIGCSE13

Jan '13: Five new modules implemented

Aug '12: Burtscher receives Early Adopter grant


Apan Qasem (PI)
Department of Computer Science
Texas State University
601 University Dr
San Marcos, TX 78666

Office: Comal 307A
Phone: (512) 245-0347
Fax: (512) 245-8750
E-mail: apan "AT" txstate · edu