Publication | Closed Access
Minimizing development and maintenance costs in supporting persistently optimized BLAS
217
Citations
17
References
2004
Year
EngineeringAlgorithmic LibraryComputer ArchitectureArray ComputingCompute KernelData ScienceDense Level 3Parallel ComputingMassively-parallel ComputingComputer EngineeringComputer ScienceComputational ScienceData Center ManagementEnergy ManagementMaintenance CostsBlas ImplementationsParallel ProgrammingPerformance-critical ApisVectorization
The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance-critical APIs in scientific computing today. It has long been understood that the most important of these routines, the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix multiply routine. In this paper, however, we show that an even larger set of operations can be efficiently maintained using a much simpler matrix multiply kernel. Indeed, this is how our own project, ATLAS (which provides one of the most widely used BLAS implementations in use today), supports a large variety of performance-critical routines. Copyright © 2004 John Wiley & Sons, Ltd.
| Year | Citations | |
|---|---|---|
Page 1
Page 1