Publication | Closed Access
A massively parallel adaptive fast-multipole method on heterogeneous architectures
95
Citations
19
References
2009
Year
Unknown Venue
Numerical AnalysisCluster ComputingEngineeringDistributed Memory ParallelismComputer ArchitectureNew Scalable AlgorithmsGpu ComputingHardware SecurityArray ComputingCompute KernelHeterogeneous ArchitecturesComputational ElectromagneticsParallel ComputingComputational GeometryMassively-parallel ComputingComputer EngineeringComputer ScienceMemory/streaming ParallelismGpu ClusterComputational ScienceHardware AccelerationParallel ProcessingParallel Programming
We present new scalable algorithms and a new implementation of our kernel-independent fast multipole method (Ying et al. ACM/IEEE SC '03), in which we employ both distributed memory parallelism (via MPI) and shared memory/streaming parallelism (via GPU acceleration) to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65K cores (AMD/CRAY-based Kraken system at NSF/NICS) for highly non-uniform point distributions. On GPU-enabled systems, we achieve 30x speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only based implementations.
| Year | Citations | |
|---|---|---|
Page 1
Page 1