Platforms: OpenMP on Shared Memory Architectures

Contents


OpenMP Overview

What is OpenMP?

Why use OpenMP?

OpenMP Programming Model

Much more information on OpenMP can be found here.


OpenMP Compliant Shared Memory Architectures

Background

Popular Architectures

The shared memory architecture consists of a number of processors which each have access to a global memory store via some interconnect or bus. The key feature is the use of a single address space across the whole memory system, so that all the processors have the same view of memory. The processors communicate with one another by one processor writing data into a location in memory and another processor reading the data. With this type of communications the time to access any piece of data is the same, as all of the communication goes through the bus. The advantage of this type of architecture is that it is easy to program as there are no explicit communications between processors, with the communications being handled via the global memory store.
(Picture taken from- anusf.anu.edu.au/~dbs900/OpenMP/openmp)
Since different threads communicate with each other by reading and writing shared memory, the latencies involved with these ‘communications’ are an important factor for overall performance. Two different kind of latencies can be distinguished: first the latency to access the main memory, second the latency that occurs in the direct ‘communication’ between two threads.


Three of the main different platforms for shared memory architecture are presented:

Multi-Core PCs

Processors have been consistently getting faster. But the more rapidly they can perform instructions, the quicker they need to receive the values of operands from memory. Unfortunately, the speed with which data can be read from and written to memory has not increased at the same rate. In response, the vendors have built computers with hierarchical memory systems, in which a small, expensive, and very fast memory called cache memory, or “cache” for short, supplies the processor with data and instructions at high rates. Each processor of an SMP needs its own private cache if it is to be fed quickly; hence, not all memory is shared. Data is copied into cache from main memory: blocks of consecutive memory locations are transferred at a time. Since the cache is very small in comparison to main memory, a new block may displace data that was previously copied in. An operation can be (almost) immediately performed if the values it needs are available in cache. But if they are not, there will be a delay while the corresponding data is retrieved from main memory. Hence, it is important to manage cache carefully.

(Picture taken from- http://mitpress.mit.edu/books/chapters/0262533022chap1)
A processor is basically a unit that reads and executes program instructions, which are fixed-length typically 32 or 64 bit or variable-length chunks of data. The data in the instruction tells the processor what to do. The instructions are very basic things like reading data from memory or sending data to the user display, but they are processed so rapidly that we experience the results as the smooth operation of a program.

A core is the part of the processor which performs reading and executing of the instruction. Single core processors can only execute one instruction at a time. However as the name implies, Multi-core processors are composed of more than one core. A very common example would be a dual core processor. The advantage of a multi-core processor over a single core one is that the multi-core processor can either use both its cores to accomplish a single task or it can span threads which divided tasks between both its cores, so that it takes twice the amount of time it would take to execute the task than it would on a single core processor. Multi- core processors can also execute multiple tasks at a single time. A common example would be watching a movie on windows media player while your dual-core processor is running a background virus check. Multi-core is a shared memory processor. All cores share the same memory. All cores are on the same chip on a multi-core architecture.

OpenMP divides tasks into threads; a thread is the smallest unit of a processing that can be scheduled by an operating system. The master thread assigns tasks unto worker threads. Afterwards, they execute the task in parallel using the multiple cores of a processor.

SGI Origin2000

This is effectively a hybrid shared and distributed memory architecture. The memory is physically distributed across nodes, with two processors located at each node having equal access to their local memory. It is a shared memory platform in the sense that all other nodes have similar access to this memory but are physically more distant, but it can still be programmed as a symmetric multi-processor (SMP) machine. Also as the number of nodes accessing this memory increases a bottleneck situation will arise, but this is a limitation one would expect.

Sun HPC Servers

Servers such as Enterprise 3000 or Enterprise 1000. These are true shared memory boxes, with the E3000 containing 1 to 6 processors and the E10000 between 4 and 64 processors.


Citation/Sources Used

Using OpenMP Portable Shared Memory Parallel Programming by Barbara Chapman, Gabriele Jost, Ruud van der Pas.
OpenMP A Parallel Programming Model for Shared Memory Architectures by Paul Graham.
OpenMP Tutorials by Blaise Barney, Lawrence Livermore National Laboratory.