Publication | Closed Access
Parallel Loops on Distributed Machines
20
Citations
9
References
2005
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureData StructuresParallel SoftwareHigh-performance ArchitectureParallel Complexity TheoryParallel ComputingParallel LoopsCode SegmentMassively-parallel ComputingComputer EngineeringComputer ScienceDistributed Memory MachinesDistributed ProcessingProgram AnalysisParallel ProcessingParallel Performance EvaluationParallel ProgrammingData-level ParallelismSystem Software
Any programming environment for distributed memory machines that allows the user to specify pdwallel do loops over globally defined data structures requires optimizations that go beyond the specification of Lrppropriate data and workload partitionings. In this paper, we consider optimizations that are required for efficient execution of a code segment that consists of pmallel loops over distributed data Structures. On distributed memory machines it is typically very expensive tci fetch individual data elements. Instead, before a parallirl loop executes, it is desirable to prefetch all off-processor data required in the loop. We specify a scheme for s boring copies of fetched data along with a scheme for accessing copies of off-processor data during the computafJ ion of the loop. The performance of such optimizations rm the iPSC/2 and the NCUBE is also presented.
| Year | Citations | |
|---|---|---|
Page 1
Page 1