Parallel computing: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 21:45, 15 October 2007 editBichito (talk \| contribs)61 editsm →External links ← Previous edit		Revision as of 00:57, 5 November 2007 edit undoArtoftransformation (talk \| contribs)Extended confirmed users671 edits →Terminology: Added another instance of Amhdal's lawNext edit →
Line 20:		Line 20:
	Some frequently used terms in parallel computing are:		Some frequently used terms in parallel computing are:
	;]: is the ] using a single processor divided by the quantity of the execution time using a ] and the number of processors.		;]: is the ] using a single processor divided by the quantity of the execution time using a ] and the number of processors.
	;]: the extra work associated with parallel version compared to its sequential code, mostly the extra ] and memory space requirements from synchronization, ]s, parallel environment creation and cancellation, etc.		;Parallel Overhead:the extra work associated with parallel version compared to its sequential code, mostly the extra ] and memory space requirements from synchronization, ]s, parallel environment creation and cancellation, etc. Also, see this ]
	;]: the coordination of simultaneous tasks to ensure correctness and avoid unexpected ]s.		;]: the coordination of simultaneous tasks to ensure correctness and avoid unexpected ]s.
	;]: also called ''parallel speedup'', which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution. ] can be used to give a maximum speedup factor.		;]: also called ''parallel speedup'', which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution. ] can be used to give a maximum speedup factor.

Revision as of 00:57, 5 November 2007

It has been suggested that this article be merged with Parallel processing. (Discuss) Proposed since July 2007.

This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Parallel computing" – news · newspapers · books · scholar · JSTOR (June 2007) (Learn how and when to remove this message)

Parallel computing is the simultaneous execution of some combination of multiple instances of programmed instructions and data on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. The technique was first put to practical use by ILLIAC IV in 1976, fully a decade after it was conceived.

Definition

A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recently-introduced multicore processors contain multiple logical processors in a single package. There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory. Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures, classifies parallel (and serial) computers according to: whether all processors execute the same instructions at the same time (single instruction/multiple data—SIMD) or whether each processor executes different instructions (multiple instruction/multiple data—MIMD).

One major way to classify parallel computers is based on their memory architectures. Shared memory parallel computers have multiple processors accessing all available memory as global address space. They can be further divided into two main classes based on memory access times: Uniform Memory Access (UMA), in which access times to all parts of memory are equal, or Non-Uniform Memory Access (NUMA), in which they are not. Distributed memory parallel computers also have multiple processors, but each of the processors can only access its own local memory; no global memory address space exists across them. Parallel computing systems can also be categorized by the numbers of processors in them. Systems with thousands of such processors are known as massively parallel. Subsequently there are what are referred to as "large scale" vs. "small scale" parallel processors. This depends on the size of the processor, e.g. a PC based parallel system would generally be considered a small scale system. Parallel processor machines are also divided into symmetric and asymmetric multiprocessors, depending on whether all the processors are the same or not (for instance if only one is capable of running the operating system code and others are less privileged).

A variety of architectures have been developed for parallel processing. For example a Ring architecture has processors linked by a ring structure. Other architectures include hypercubes, fat trees, systolic arrays, and so on.

Theory and practice

Parallel computers can be modelled as Parallel Random Access Machines (PRAMs). The PRAM model ignores the cost of interconnection between the constituent computing units, but is nevertheless very useful in providing upper bounds on the parallel solvability of many problems. In reality the interconnection plays a significant role. The processors may communicate and cooperate in solving a problem or they may run independently, often under the control of another processor which distributes work to and collects results from them (a "processor farm").

Processors in a parallel computer may communicate with each other in a number of ways, including shared (either multiported or multiplexed) memory, a crossbar, a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), an n-dimensional mesh, etc. Parallel computers based on interconnect network need to employ some kind of routing to enable passing of messages between nodes that are not directly connected. The communication medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines. Similarly, memory may be either private to the processor, shared between a number of processors, or globally shared. Systolic array is an example of a multiprocessor with fixed function nodes, local-only memory and no message routing.

Approaches to parallel computers include multiprocessing, parallel supercomputers, NUMA vs. SMP vs. massively parallel computer systems, distributed computing (esp. computer clusters and grid computing). According to Amdahl's law, parallel processing is less efficient than one x-times-faster processor from a computational perspective. However, since power consumption is a super-linear function of the clock frequency on modern processors, we are reaching the point where from an energy cost perspective it can be cheaper to run many low speed processors in parallel than a single highly clocked processor.

Terminology

Some frequently used terms in parallel computing are:

Efficiency: is the execution time using a single processor divided by the quantity of the execution time using a multiprocessor and the number of processors.
Parallel Overhead: the extra work associated with parallel version compared to its sequential code, mostly the extra CPU time and memory space requirements from synchronization, data communications, parallel environment creation and cancellation, etc. Also, see this Amdahl's law
Synchronization: the coordination of simultaneous tasks to ensure correctness and avoid unexpected race conditions.
Speedup: also called parallel speedup, which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution. Amdahl's law can be used to give a maximum speedup factor.
Scalability: a parallel system's ability to gain proportionate increase in parallel speedup with the addition of more processors. Also, see this Parallel Computing Glossary
Task: a logically high level, discrete, independent section of computational work. A task is typically executed by a processor as a program

Algorithms

Parallel algorithms can be constructed by redesigning serial algorithms to make effective use of parallel hardware. However, not all algorithms can be parallelized. This is summed up in a famous saying:

One woman can have a baby in nine months, but nine women can't have a baby in one month.

In practice, linear speedup (i.e., speedup proportional to the number of processors) is very difficult to achieve. This is because many algorithms are essentially sequential in nature; this is more formally stated in Amdahl's law. Certain workloads can benefit from pipeline parallelism when extra processors are added. This uses a factory assembly line approach to divide the work. If the work can be divided into n stages where a discrete deliverable is passed from stage to stage, then up to n processors can be used. However, the slowest stage will hold up the other stages so it is rare to be able to fully use n processors.

Parallel problems

Well known parallel software problem sets include embarrassingly parallel and Grand Challenge problems.

Parallel programming

Parallel programming is the design, implementation, and tuning of parallel computer programs which take advantage of parallel computing systems. It also refers to the application of parallel programming methods to existing serial programs (parallelization). Parallel programming focuses on partitioning the overall problem into separate tasks, allocating tasks to processors and synchronizing the tasks to get meaningful results. Parallel programming can only be applied to problems that are inherently parallelizable, mostly without data dependence. A problem can be partitioned based on domain decomposition or functional decomposition, or a combination.

There are two major approaches to parallel programming: implicit parallelism, where the system (the compiler or some other program) partitions the problem and allocates tasks to processors automatically (also called automatic parallelizing compilers); or explicit parallelism, where the programmer must annotate their program to show how it is to be partitioned. Many factors and techniques impact the performance of parallel programming, especially load balancing, which attempts to keep all processors busy by moving tasks from heavily loaded processors to less loaded ones.

Some people consider parallel programming to be synonymous with concurrent programming. Others draw a distinction between parallel programming, which uses well-defined and structured patterns of communications between processes and focuses on parallel execution of processes to enhance throughput, and concurrent programming, which typically involves defining new patterns of communication between processes that may have been made concurrent for reasons other than performance. In either case, communication between processes is performed either via shared memory or with message passing, either of which may be implemented in terms of the other.

Programs which work correctly in a single CPU system may not do so in a parallel environment. This is because multiple copies of the same program may interfere with each other, for instance by accessing the same memory location at the same time. Therefore, careful programming (synchronization) is required in a parallel system.

Parallel programming models

Main article: Parallel programming model

A parallel programming model is a computing architecture and language designed to express parallelism in software systems and applications. The software to support these models include compilers, libraries and other tools that enable the application to use parallel hardware.

Parallel models are implemented in several ways: as libraries invoked from traditional sequential languages, as language extensions, or complete new execution models. They are also roughly categorized for two kinds of systems: shared memory systems and distributed memory systems, though the lines between them are largely blurred nowadays.

References

External links

Template:Dmoz
GigaSpaces on Scaling out of stateful applications
http://www.llnl.gov/computing/tutorials/parallel_comp/ Introduction to Parallel Computing
http://www-unix.mcs.anl.gov/dbpp/ Designing and Building Parallel Programs, by Ian Foster
A Berkeley View on the Parallel Computing Landscape Argues for the desperate need to innovate around "manycore".
"Multiprocessor Optimizations: Fine-Tuning Concurrent Access to Large Data Collections" by Ian Emmons
Rogue Wave on Software Pipelines
Internet Parallel Computing Archive
National HPCC Software Exchange
Parallel processing topic area at IEEE Distributed Computing Online
"The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" by Herb Sutter
COPACOBANA (Cost-Optimized Parallel COde Breaker), an FPGA-based parallel computer
WebSphere Advisor on Software Pipelines
Russian processors "Elbrus"

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

Categories:

Misplaced Pages