Publication | Open Access
STS-k
21
Citations
8
References
2015
Year
Unknown Venue
Cluster ComputingLatency ReductionGraph ModelEngineeringMassively-parallel ComputingData-level ParallelismParallel Performance EvaluationMany-core ArchitectureComputer ArchitectureComputer EngineeringComputational ComplexityParallel ProgrammingComputer ScienceParallel ComputingManycore ProcessorEarlier Coloring
We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD `MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1