Publication | Open Access
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
376
Citations
7
References
2010
Year
Unknown Venue
Cluster ComputingHeterogeneous ComputingEngineeringComputer ArchitectureArchitectural SupportHigh Performance ComputingProcessor ArchitectureHardware ArchitectureComplex Hardware TopologyHardware SecurityManaging Hardware AffinitiesHardware LocalityHigh-performance ArchitectureSystems EngineeringParallel ComputingManycore ProcessorHybrid Hpc WorkloadOpenmp ThreadsComputer EngineeringComputer ScienceHpc ApplicationsMany-core ArchitectureParallel ProgrammingGeneric FrameworkSystem Software
The increasing core counts, shared caches, and memory nodes create a complex hardware topology that HPC applications must adapt to. We introduce the Hardware Locality (hwloc) software to gather hardware information and expose it to applications and runtime systems in an abstracted, portable hierarchical manner. hwloc gathers processor, cache, memory node, and other hardware details and enables runtime systems to dynamically select optimized MPI communication strategies based on process location and hardware characteristics. hwloc significantly improves performance by allowing runtime systems to place tasks or adapt communication strategies according to hardware affinities, and it is already usable by popular OpenMP or MPI software, with thread scheduling and process placement yielding noticeable gains.
The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefully adapt their placement and behavior according to the underlying hierarchy of hardware resources and their software affinities. We introduce the Hardware Locality (hwloc) software which gathers hardware information about processors, caches, memory nodes and more, and exposes it to applications and runtime systems in a abstracted and portable hierarchical manner. hwloc may significantly help performance by having runtime systems place their tasks or adapt their communication strategies depending on hardware affinities. We show that hwloc can already be used by popular high-performance OpenMP or MPI software. Indeed, scheduling OpenMP threads according to their affinities or placing MPI processes according to their communication patterns shows interesting performance improvement thanks to hwloc. An optimized MPI communication strategy may also be dynamically chosen according to the location of the communicating processes in the machine and its hardware characteristics.
| Year | Citations | |
|---|---|---|
Page 1
Page 1