Analysis of parallel algorithms
In computer science, analysis of parallel algorithms is the process of finding the computational complexity of algorithms executed in parallel – the amount of time, storage, or other resources needed to execute them. In many respects, analysis of parallel algorithms is similar to the analysis of sequential algorithms, but is generally more involved because one must reason about the behavior of multiple cooperating threads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed.
Background
A so-called work-time (WT) (sometimes called work-depth, or work-span) framework was originally introduced by Shiloach and Vishkin [1] for conceptualizing and describing parallel algorithms. In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is guided by the proof of a scheduling theorem due to Brent,[2] which is explained later in this article. The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. For example, the WT framework was adopted as the basic presentation framework in the parallel algorithms books (for the parallel random-access machine PRAM model) [3] and, [4] as well as in the class notes .[5] The overview below explains how the WT framework can be used for analyzing more general parallel algorithms, even when their description is not available within the WT framework.
Definitions
Template:Anchor Suppose computations are executed on a machine that has Template:Mvar processors. Let Template:Mvar denote the time that expires between the start of the computation and its end. Analysis of the computation's running time focuses on the following notions:
- The work of a computation executed by Template:Mvar processors is the total number of primitive operations that the processors perform.[6] Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted Template:Math.
- The depth or span is the length of the longest series of operations that have to be performed sequentially due to data dependencies (the Template:Visible anchor). The depth may also be called the critical path length of the computation.[7] Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time.[8] Alternatively, the span can be defined as the time Template:Math spent computing using an idealized machine with an infinite number of processors.[9]
- The cost of the computation is the quantity Template:Mvar. This expresses the total time spent, by all processors, in both computing and waiting.[6]
Several useful results follow from the definitions of work, span and cost:
- Work law. The cost is always at least the work: Template:Math. This follows from the fact that Template:Mvar processors can perform at most Template:Mvar operations in parallel.[6][9]
- Span law. A finite number Template:Mvar of processors cannot outperform an infinite number, so that Template:Math.[9]
Using these definitions and laws, the following measures of performance can be given:
- Speedup is the gain in speed made by parallel execution compared to sequential execution: Template:Math. When the speedup is Template:Math for Template:Mvar processors (using big O notation), the speedup is linear, which is optimal in simple models of computation because the work law implies that Template:Math (super-linear speedup can occur in practice due to memory hierarchy effects). The situation Template:Math is called perfect linear speedup.[9] An algorithm that exhibits linear speedup is said to be scalable.[6] Analytical expressions for the speedup of many important parallel algorithms are presented in this book.[10]
- Efficiency is the speedup per processor, Template:Math.[6]
- Parallelism is the ratio Template:Math. It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if Template:Math, then:[9]
- The slackness is Template:Math. A slackness less than one implies (by the span law) that perfect linear speedup is impossible on Template:Mvar processors.[9]
Execution on a limited number of processors
Analysis of parallel algorithms is usually carried out under the assumption that an unbounded number of processors is available. This is unrealistic, but not a problem, since any computation that can run in parallel on Template:Mvar processors can be executed on Template:Math processors by letting each processor execute multiple units of work. A result called Brent's law states that one can perform such a "simulation" in time Template:Mvar, bounded by[11]
or, less precisely,[6]
An alternative statement of the law bounds Template:Mvar above and below by
- .
showing that the span (depth) Template:Math and the work Template:Math together provide reasonable bounds on the computation time.[2]
References
- ↑ Template:Cite journal
- ↑ 2.0 2.1 Template:Cite journal
- ↑ Template:Cite book
- ↑ Template:Cite book
- ↑ Template:Cite book
- ↑ 6.0 6.1 6.2 6.3 6.4 6.5 Template:Cite book
- ↑ Template:Cite journal
- ↑ Template:Cite book
- ↑ 9.0 9.1 9.2 9.3 9.4 9.5 Template:Introduction to Algorithms
- ↑ Template:Cite book
- ↑ Template:Cite encyclopedia