Publication | Closed Access
Improving the memory-system performance of sparse-matrix vector multiplication
200
Citations
17
References
1997
Year
Sparse-matrix Vector MultiplicationIncrease Instruction-level ParallelismArray ComputingEngineeringSuperscalar Risc ProcessorsHigh-performance ArchitectureParallel Performance EvaluationComputer EngineeringComputer ArchitectureComputing SystemsVector ProcessingParallel ProgrammingComputer ScienceParallel ComputingCompilersHardware SystemsParallel AlgorithmsVectorization
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses (originally due to Das et al.), blocking to reduce load instructions, and prefetching to prevent multiple load-store units from stalling simultaneously. The techniques improve performance from about 40 MFLOPS (on a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machine. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UltraSPARC™ I workstation, for example.
| Year | Citations | |
|---|---|---|
Page 1
Page 1