Publication | Closed Access
Software-hardware co-design for fast and scalable training of deep learning recommendation models
116
Citations
46
References
2022
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningComputer ArchitectureScalable TrainingRecommendation ModelsData ParallelismData ScienceComputing SystemsEmbedded Machine LearningParallel ComputingLarge Ai ModelNetwork FlowsComputer EngineeringComputer ScienceDeep LearningNeural Architecture SearchZionex NodesModel CompressionHardware AccelerationSoftware-hardware Co-designParallelism StrategyParallel Programming
Deep learning recommendation models (DLRMs) have been used across many business-critical services at Meta and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper, we present Neo, a software-hardware co-designed system for high-performance distributed training of large-scale DLRMs. Neo employs a novel 4D parallelism strategy that combines table-wise, row-wise, column-wise, and data parallelism for training massive embedding operators in DLRMs. In addition, Neo enables extremely high-performance and memory-efficient embedding computations using a variety of critical systems optimizations, including hybrid kernel fusion, software-managed caching, and quality-preserving compression. Finally, Neo is paired with ZionEX, a new hardware platform co-designed with Neo's 4D parallelism for optimizing communications for large-scale DLRM training. Our evaluation on 128 GPUs using 16 ZionEX nodes shows that Neo outperforms existing systems by up to 40× for training 12-trillion-parameter DLRM models deployed in production.
| Year | Citations | |
|---|---|---|
2017 | 75.5K | |
2015 | 46.2K | |
2017 | 18.2K | |
2019 | 16.2K | |
2009 | 11.4K | |
2017 | 9K | |
2010 | 8.6K | |
2017 | 6.4K | |
2015 | 4.6K | |
2016 | 3.3K |
Page 1
Page 1