Parallelization Techniques (B)

Description: This module focuses on non-recursive methods of parallel decomposition. Techniques for exploiting data and task parallelism are covered. The first lecture introduces these ideas at the algorithmic level. Scanning arrays is used as an example problem. The second lecture focuses on the programming issues related to data parallelism. Students learn to implement a simple data-parallel program in different ways using OpenMP. Metrics for evaluating the parallel performance are also covered. The third lecture discusses, on the example of parallel linked list operations, how to determine problematic interactions and how to avoid them using locks. Advanced techniques for maximizing the parallelism while minimizing the locking overhead are also presented.

Recommended Length: : Four lectures (1 hour for Part 1, 1 hour for Part 2, and 2 hours for Part 3)

Recommended Course: CS 2 (Part 1), Algorithms (Part 2), Data Structures (Part 3)

Topics and Learning outcomes (per NSF/IEEE-TCPP PDC Curriculum):

  • [Programming] Shared memory: Be able to write correct thread-based programs (protecting shared data) and understand how to obtain speed up.
  • [Programming] Compiler directives/pragmas: Understand what simple directives, such as those of OpenMP, mean (parallel for, concurrent section), show examples.
  • [Programming] Parallel loops for shared memory: Know, through an example, one way to implement parallel loops, understand collision/dependencies across iterations (e.g., OpenMP).
  • [Programming] Data races: Know what a data race is, and how to use synchronization to prevent it.
  • [Programming] Computation decomposition strategies: Understand different ways to assign computations to threads or processes.
  • [Programming] Performance metrics: Know the basic definitions of performance metrics (runtime, speedup, efficiency).

Lecture Material: [ PDF ] [ PPT ]

Sample Source Code:

  • Ranksort with outer loop parallelized using OpenMP [ C file ]
  • Ranksort with inner loop parallelized using OpenMP and an atomic [ C file ]
  • Ranksort with inner loop parallelized using OpenMP and reduction [ C file ]
  • Ranksort with outer loop parallelized using Pthreads [ C file ]
  • Ranksort with inner loop parallelized using Pthreads and an atomic [ C file ]
  • Ranksort with inner loop parallelized using Pthreads and reduction [ C file ]
  • Ranksort with outer loop parallelized using CUDA [ cu file ]
  • Ranksort with inner loop parallelized using CUDA and an atomic [ cu file ]

Pedagogical Notes: available for instructors only

Sample Exam Question: available for instructors only

News

Jun '15: Qasem speaks at HPC Workshop at Prairiw View A & M

Oct '14: Paper accepted at SIGCSE15

Oct '14: Short paper accepted at EduHPC14 (co-located with SC14)

Aug '14: First regional workshop held at Texas State

May '14: Call for participation in first regional workshop

Mar '14: Qasem serves as penelist in SIGCSE special session on PDC

Nov '13: Poster presented at Supercomputing conference

Sep '13: Paper accepted at EduPDHPC13

Aug '13: Qasem participates in CSinParallel Four Corners Workshop

Jul '13: Qasem receives Early Adopter grant

Mar '13: Qasem presents at NSF Showcase at SIGCSE13

Jan '13: Five new modules implemented

Aug '12: Burtscher receives Early Adopter grant

Contact

Apan Qasem (PI)
Department of Computer Science
Texas State University
601 University Dr
San Marcos, TX 78666

Office: Comal 307A
Phone: (512) 245-0347
Fax: (512) 245-8750
E-mail: apan "AT" txstate · edu