Dorylus: Affordable, Scalable, and Accurate GNN Training with\n Distributed CPU Servers and Serverless Threads

Abstract

A graph neural network (GNN) enables deep learning on structured graph data.\nThere are two major GNN training obstacles: 1) it relies on high-end servers\nwith many GPUs which are expensive to purchase and maintain, and 2) limited\nmemory on GPUs cannot scale to today's billion-edge graphs. This paper presents\nDorylus: a distributed system for training GNNs. Uniquely, Dorylus can take\nadvantage of serverless computing to increase scalability at a low cost.\n The key insight guiding our design is computation separation. Computation\nseparation makes it possible to construct a deep, bounded-asynchronous pipeline\nwhere graph and tensor parallel tasks can fully overlap, effectively hiding the\nnetwork latency incurred by Lambdas. With the help of thousands of Lambda\nthreads, Dorylus scales GNN training to billion-edge graphs. Currently, for\nlarge graphs, CPU servers offer the best performance-per-dollar over GPU\nservers. Just using Lambdas on top of CPU servers offers up to 2.75x more\nperformance-per-dollar than training only with CPU servers. Concretely, Dorylus\nis 1.22x faster and 4.83x cheaper than GPU servers for massive sparse graphs.\nDorylus is up to 3.8x faster and 10.7x cheaper compared to existing\nsampling-based systems.\n