The Use of BLAS3 in Linear Algebra on a Parallel Processor with a Hierarchical Memory

Abstract

Previous article Next article The Use of BLAS3 in Linear Algebra on a Parallel Processor with a Hierarchical MemoryKyle Gallivan, William Jalby, and Ulrike MeierKyle Gallivan, William Jalby, and Ulrike Meierhttps://doi.org/10.1137/0908086PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstractThis note describes work at CSRD which shows that a third level of the BLAS (BLAS3) is needed to achieve high-performance on multivector processors with a shared hierarchical memory.[1] M. Berry, , K. Gallivan, , W. Harrod, , W. Jalby, , S. Lo, , U. Meier, , B. Phillipe and , A. Sameh, Parallel numerical algorithms on the CEDAR system, CSRD Report, CSRD University of Illinois at Urbana-Champaign, Urbana, IL, 1986 Google Scholar[2] D. Calahan, Block-oriented, local-memory-based linear equation solution on the CRAY-2: uniprocessor algorithms, Proc. ICPP 1986, IEEE Computer Society Press, Washington D.C., 1986, August Google Scholar[3] J. Dongarra, , J. Bunch, , C. Moler and , G. W. Stewart, LINPACK User's Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1979 0476.68025 LinkGoogle Scholar[4] J. Dongarra, , J. DuCroz, , S. Hammarling and , R. Hanson, A proposal for an extended set of Fortran basic linear algebra subprograms, Technical Memo, #41, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 1984, December Google Scholar[5] K. Gallivan, , W. Jalby, , U. Meier and , A. Sameh, The impact of hierarchical memory systems on linear algebra algorithm design, CSRD Report, CSRD University of Illinois at Urbana-Champaign, Urbana, IL, 1986 Google Scholar[6] W. Jalby and , U. Meier, Optimizing matrix operations on a parallel multiprocessor with a two-level memory hierarchy, CSRD Report, CSRD University of Illinois at Urbana-Champaign, Urbana, IL, 1986 Google Scholar[7] G. Peters and , J. Wilkinson, On the stability of Gauss-Jordan elimination with pivoting, Comm. ACM, 18 (1975), 20–24 10.1145/360569.360653 51:7261 0318.65009 CrossrefISIGoogle ScholarKeywordsBLAS3 or third-level BLASnumerical linear algebranumerical softwareparallel computingcache management Previous article Next article FiguresRelatedReferencesCited ByDetails Gaussian variant of Freivalds' algorithm for efficient and reliable matrix product verificationMonte Carlo Methods and Applications, Vol. 0, No. 0 | 8 Oct 2020 Cross Ref Block Conjugate Gradient algorithms for least squares problemsJournal of Computational and Applied Mathematics, Vol. 317 | 1 Jun 2017 Cross Ref General Linear SystemsParallelism in Matrix Computations | 26 July 2015 Cross Ref Fundamental KernelsParallelism in Matrix Computations | 26 July 2015 Cross Ref A Fast Batched Cholesky Factorization on a GPU2014 43rd International Conference on Parallel Processing | 1 Sep 2014 Cross Ref LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) | 1 Aug 2014 Cross Ref The block Schur algorithm for designing optical multi-layered structuresOptics Communications, Vol. 272, No. 1 | 1 Apr 2007 Cross Ref Parallel Algorithms for the Singular Value DecompositionHandbook of Parallel Computing and Statistics | 30 January 2010 Cross Ref A role for Pareto optimality in mining performance dataConcurrency and Computation: Practice and Experience, Vol. 17, No. 1 | 1 January 2004 Cross Ref Korean Journal of Computational & Applied Mathematics, Vol. 8, No. 1 | 2001 Cross Ref Compilation techniques for parallel systemsParallel Computing, Vol. 25, No. 13-14 | 1 Dec 1999 Cross Ref The RISC BLASACM Transactions on Mathematical Software, Vol. 25, No. 3 | 1 Sep 1999 Cross Ref The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processorsVector and Parallel Processing — VECPAR'96 | 5 August 2005 Cross Ref A locally optimized reordering algorithm and its application to a parallel sparse linear system solverComputing, Vol. 54, No. 1 | 1 Mar 1995 Cross Ref Fast parallel solution of the Poisson equation on irregular domainsNumerical Algorithms, Vol. 8, No. 2 | 1 Sep 1994 Cross Ref Portable Parallel implementation of BLAS 3Concurrency: Practice and Experience, Vol. 6, No. 5 | 1 Aug 1994 Cross Ref A parallel block implementation of Level-3 BLAS for MIMD vector processorsACM Transactions on Mathematical Software, Vol. 20, No. 2 | 1 Jun 1994 Cross Ref Fast Enumeration of Solutions for Data Dependence Analysis and Data Locality Optimization1993 International Conference on Parallel Processing - ICPP'93 Vol3 | 1 Aug 1993 Cross Ref Block-Cholesky for parallel processingApplied Numerical Mathematics, Vol. 10, No. 1 | 1 Jun 1992 Cross Ref Large-Scale Sparse Singular Value ComputationsThe International Journal of Supercomputing Applications, Vol. 6, No. 1 | 12 September 2016 Cross Ref Chapter 6 A survey of matrix computationsComputing | 1 Jan 1992 Cross Ref Matrix Multiplication on Digital Signal Processors and Hierarchical Memory SystemsComputer Science | 1 Jan 1992 Cross Ref A block algorithm for orthogonalization in elliptic normsParallel Processing: CONPAR 92—VAPP V | 29 May 2005 Cross Ref A Quantitative Algorithm for Data Locality OptimizationCode Generation — Concepts, Tools, Techniques | 1 Jan 1992 Cross Ref Stability Analysis and Improvement of the Block Gram–Schmidt AlgorithmSIAM Journal on Scientific and Statistical Computing, Vol. 12, No. 5 | 13 July 2006AbstractPDF (1376 KB)Use of Level 3 Blas in Lu Factorization in a Multiprocessing Environment On Three Vector Multiprocessors: the Alliant Fx/80, the Cray-2, and the Ibm 3090 VfThe International Journal of Supercomputing Applications, Vol. 5, No. 3 | 16 September 2016 Cross Ref THEORIE UND PRAXISPIK - Praxis der Informationsverarbeitung und Kommunikation, Vol. 14, No. 3 | 1 Jan 1991 Cross Ref A Hybrid Scheme for the Singular Value Decomposition on a MultiprocessorNumerical Linear Algebra, Digital Signal Processing and Parallel Algorithms | 1 Jan 1991 Cross Ref Vector processing in simplex and interior methods for linear programmingAnnals of Operations Research, Vol. 22, No. 1 | 1 Dec 1990 Cross Ref Exploiting fast matrix multiplication within the level 3 BLASACM Transactions on Mathematical Software, Vol. 16, No. 4 | 1 Dec 1990 Cross Ref Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VFACM SIGARCH Computer Architecture News, Vol. 18, No. 3b | 1 Sep 1990 Cross Ref Parallel Algorithms for Dense Linear Algebra ComputationsSIAM Review, Vol. 32, No. 1 | 18 July 2006AbstractPDF (10106 KB)A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, Vol. 16, No. 1 | 1 Mar 1990 Cross Ref An overview of parallel algorithms for the singular value and symmetric eigenvalue problems* *This work was supported in part by the National Science Foundation under Grant Nos. US NSF MIP-8410110 and US NSF DCR85–09970, the U.S. Department of Energy under Grant No. US DOE DE-FG02–85ER25001, and the IBM Donation.Parallel Algorithms for Numerical Linear Algebra | 1 Jan 1990 Cross Ref An adaptive blocking strategy for matrix factorizationsCONPAR 90 — VAPP IV | 2 June 2005 Cross Ref Gauß—EliminationLösung linearer Gleichungssysteme auf Parallelrechnern | 1 Jan 1990 Cross Ref Krylov Subspace Methods on SupercomputersSIAM Journal on Scientific and Statistical Computing, Vol. 10, No. 6 | 13 July 2006AbstractPDF (4732 KB)An overview of parallel algorithms for the singular value and symmetric eigenvalue problemsJournal of Computational and Applied Mathematics, Vol. 27, No. 1-2 | 1 Sep 1989 Cross Ref Adaptive blocking in the QR factorizationThe Journal of Supercomputing, Vol. 3, No. 3 | 1 Sep 1989 Cross Ref The Perfect Club Benchmarks: Effective Performance Evaluation of SupercomputersThe International Journal of Supercomputing Applications, Vol. 3, No. 3 | 16 September 2016 Cross Ref A pseudospectral matrix element method for solution of three-dimensional incompressible flows and its parallel implementationJournal of Computational Physics, Vol. 83, No. 2 | 1 Aug 1989 Cross Ref Level 3 Blas in Lu Factorization On the Cray-2, Eta-10P, and Ibm 3090-200/VfThe International Journal of Supercomputing Applications, Vol. 3, No. 2 | 16 September 2016 Cross Ref Tools to aid in the analysis of memory access patterns for FORTRAN programsParallel Computing, Vol. 9, No. 1 | 1 Dec 1988 Cross Ref Strategies for cache and local memory management by global program transformationJournal of Parallel and Distributed Computing, Vol. 5, No. 5 | 1 Oct 1988 Cross Ref Impact of Hierarchical Memory Systems On Linear Algebra Algorithm DesignThe International Journal of Supercomputing Applications, Vol. 2, No. 1 | 16 September 2016 Cross Ref Matrix Computations on Shared-Memory MultiprocessorsAdvanced Computing Concepts and Techniques in Control Engineering | 1 Jan 1988 Cross Ref The LINPACK benchmark on the Fujitsu FAP 1000[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation Cross Ref A three-parameter fast Givens QR algorithm for superscalar processorsProceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing Cross Ref Optimum block size for the computation of Schur algorithm using a CRAYMTS/IEEE Oceans 2001. An Ocean Odyssey. Conference Proceedings (IEEE Cat. No.01CH37295) Cross Ref Volume 8, Issue 6| 1987SIAM Journal on Scientific and Statistical Computing History Submitted:20 October 1986Accepted:21 January 1987Published online:14 July 2006 InformationCopyright © 1987 Society for Industrial and Applied MathematicsKeywordsBLAS3 or third-level BLASnumerical linear algebranumerical softwareparallel computingcache managementMSC codes68B9965F0565F2565F30PDF Download Article & Publication DataArticle DOI:10.1137/0908086Article page range:pp. 1079-1084ISSN (print):0196-5204ISSN (online):2168-3417Publisher:Society for Industrial and Applied Mathematics

References

Page 1

	Year	Citations

Page 1