CS 671 Performance Aware Reliable Parallel Software for Multicore Processors

General Information


This is a seminar course. Students are required to do individual projects and critically review papers. Mail the instructor if you have questions.



Course Description

This course examines parallel programming, debugging and testing on modern multicore processors. Unlike the sequential single-core processors of the past, utilizing a multicore processor requires programmers to identify parallelism and write explicitly parallel code. Further debugging the inevitable concurrency bugs in these parallel programs is notoriously difficult.

In the first part of the course, we will explore topics to assist the programmer in expressing parallelism. Topics covered include: the relevant architectural trends and aspects of multicores, approaches for writing multicore software by extracting data parallelism (vectors and SIMD), thread- level parallelism, and task-based parallelism, efficient synchronization, and program profiling and performance tuning. In the second part of the course, we will explore dynamic analyses and runtime systems that help programmers write correct, scalable software and diagnose bugs in parallel programs. The course focuses primarily on mainstream shared-memory multicores with some coverage of graphics processing units (GPUs). Several programming assignments and a course project will provide students first-hand experience with programming, experimentally analyzing, tuning and debugging multicore software. Students are expected to have a reasonable understanding of computer architecture and strong programming skills (including experience with C/C++).

Course Objectives

This course will provide the necessary background that will enable students to write efficient and correct parallel programs.

Course Evaluation

Academic Integrity

The goal of this course is to learn. Collaboration is encouraged. However, You should not, copy the actual code for programming assignments, the reviews or copy the wording for written homeworks. Any violation of these rules will be considered cheating and dealt with severely. Here are links to the Rutgers University Academic Integrity Policy and the Department of Computer Science Integrity Policy.


The final project is the main ingredient of this course. Students are expected to conduct original research and report their findings in a 6-page workshop paper-style project report. The project can either a new parallelization techinque, a new dynamic analysis, insightful experimental evaluation of bug finding techniques with real-world applications, or any other topic of your choice. I will suggest project ideas. However, students are welcome and are encouraged to suggest their own projects. Projects must be done individually. All projects are expected to produce a working prototype, which must be demo'ed in class.

The project will have the following checkpoints:

Choosing a project topic. You will first decide on the project topic. During this phase, you will meet meet with me to discuss possible ideas on a project topic.

Project proposal. You will submit a short 3 page document stating (i) the problem that you propose to solve; (ii) why the problem is relevant; (iii) proposed solution methodology; and (iv) the research challenges that you expect to face. Once you have submitted the project proposal and have it approved by me, you will begin work on your project. Please start early and work regularly! Working at the last minute will not suffice.

Midpoint review. By this time, you are expected to have made significant progress toward achieving the goals stated in your project proposal, or must have a clear idea of the difficulties that are hampering your progress. You will meet with me to discuss your progress. You are also expected to have conducted a thorough survey of related work in the area, and are expected to have a writeup of related work (you will reuse this in your final project report as well).

Presentation. You will present your work to the rest of the class. Depending on the number of class projects, we will either have class presentations, posters, or you will present individually to me. Please make your presentations clear and concise. All projects are expected to produce working prototypes, which must be demo'ed during the presentation Submission of final project report. Your final project report must provide a clear description of the problem and solution, and your evaluation. It must closely mimic the style of a conference paper.

Reading List

Partial list. More coming soon


  • Software and the Concurrency Revolution by Sutter
  • The Free Lunch is Over: A Fundamental Turn Toward Concurrency in Software by Sutter and Larus
  • Niagara: A 32-Way Multithreaded SPARC Processor, Kongetira et al, IEEE Micro 2005


  • Selecting Locking Primitives for Parallel Programming, Paul E. McKenny, CACM 1996
  • Selecting Locking Designs for Parallel Programs, Paul E. McKenny, Pattern Languages of Program Design
  • [Book] Principles of Parallel Programming, Larry Synder and Calvin Lin, Addison-Wesley, 2008
  • [Book] Patterns for Parallel Programming, Timothy G Mattson, Beverly A. Sanders, Berna L. Massingill
  • [Book] The Art of Multiprocessor Programming, Maurice Herlihy, Nir Shavit
  • TM

Abstractions and Frameworks

  • Optimistic Parallelism requires Abstractions, PLDI 2007
  • The Implementation of the Cilk-5 Multithreaded language, PLDI 1998
  • A Java Fork/Join Framework by Doug Lea
  • The Tao of Parallelism in Algorithms, PLDI 2011
  • TBB
  • Intel CT
  • Open MP

Performance Prediction & Pathology Detection

  • Kremlin: Rethinking and Rebooting gprof for the Multicore Age, PLDI 2011
  • Towards a Holistic Approach to Auto-Parallelization: Integrating Profile-driven Parallelism Detection and Machine Learning based Mapping, PLDI 2009

Bug finding & Dynamic analyses

  • Learning from mistakes: a comprehensive study on real world concurrency bug characteristics, ASPLOS 2008
  • Finding and Reproducing Heisenbugs in Concurrent Programs, OSDI 2012
  • Scalable and Precise Dynamic Datarace Detection for Structured Parallelism, PLDI 2012
  • Eraser: A Dynamic Datarace Detector for Multithreaded Programs, Transactions on Computer Systems, 1997
  • FastTrack: Efficient and Precise Dynamic Race Detection, PLDI 2009
  • Goldilocks: A Race and Transaction-Aware Java Runtime, PLDI 2007
  • A Randomized Scheduler with Probabilisitic Guarantees of Finding Bugs, ASPLOS 2010
  • Maple: A Coverage-Driven Testing Tool for Multithreaded Programs, OOPSLA 2012

Memory models & dynamic analyses

  • DRFx: PLDI 2010
  • Conflict Exceptions: ISCA 2010

Determinstic Execution

  • Schedule Memoization
  • DMP
  • Kendo
  • DPJ