Publication | Closed Access
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures
51
Citations
17
References
2010
Year
Unknown Venue
State-of-the-art Multicore ArchitecturesEngineeringGpu BenchmarkingQuad-core NehalemComputer ArchitectureStructural OptimizationSupercomputer ArchitectureVictoria FallsHardware SystemsGpu ComputingFast Multipole MethodHigh-performance ArchitectureComputer DesignComputing SystemsComputational ElectromagneticsParallel ComputingManycore ProcessorAntennaComputer EngineeringComputer ScienceGpu ArchitectureHardware AccelerationMany-core ArchitectureParallel Programming
This work presents the first extensive study of single-node performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multi-core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, OpenMP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double-precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls (dual-sockets on all systems). We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture (Nehalem) reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture.
| Year | Citations | |
|---|---|---|
2005 | 5K | |
1987 | 4.9K | |
2004 | 490 | |
1993 | 457 | |
2008 | 192 | |
2008 | 181 | |
2004 | 128 | |
2008 | 101 | |
2009 | 95 | |
2003 | 81 |
Page 1
Page 1