Publication | Closed Access
NeutronStar: Distributed GNN Training with Hybrid Dependency Management
47
Citations
23
References
2022
Year
Cluster ComputingEngineeringMachine LearningComputer ArchitectureRecurrent Neural NetworkGraph ProcessingGpu ComputingData ScienceDependencies-cached ApproachDependencies-communicated ApproachParallel ComputingComputer EngineeringComputer ScienceDistributed LearningGpu ClusterNeural Architecture SearchHybrid Dependency ManagementVertex DependenciesParallel ProgrammingGraph Neural Network
GNN's training needs to resolve issues of vertex dependencies, i.e., each vertex representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or a dependencies-communicated approach. Having made intensive experiments and analysis, we find that a decision to choose one or the other approach for the best performance is determined by a set of factors, including graph inputs, model configurations, and an underlying computing cluster environment. If various GNN trainings are supported solely by one approach, the performance results are often suboptimal. We study related factors for each GNN training before its execution to choose the best-fit approach accordingly. We propose a hybrid dependency-handling approach that adaptively takes the merits of the two approaches at runtime. Based on the hybrid approach, we further develop a distributed GNN training system called NeutronStar, which makes high performance GNN trainings in an automatic way. NeutronStar is also empowered by effective optimizations in CPU-GPU computation and data processing. Our experimental results on 16-node Aliyun cluster demonstrate that NeutronStar achieves 1.81X-14.25X speedup over existing GNN systems including DistDGL and ROC.
| Year | Citations | |
|---|---|---|
Page 1
Page 1