Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models - Concepedia

Concepedia

Publication | Open Access

Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models

DOI Full Paper Access

47

Citations

10

References

2022

Year

Shibo Wang, Jinliang Wei, Amit Sabne, Andy Davis, Berkin Ilbeyi, Blake A. Hechtman, Dehao Chen, Karthik Murthy, Marcello Maggioni, Qiao Zhang,

Unknown Venue

Artificial IntelligenceCluster ComputingEngineeringMachine LearningComputer ArchitectureCommunication ComplexityDependent ComputationData ScienceApproximate ComputingSparse Neural NetworkMulti-task LearningParallel ComputingMassively-parallel ComputingOverlap CommunicationComputer EngineeringComputer ScienceDeep LearningGpu ClusterLarge ModelsHardware AccelerationDistributed Accelerator ClusterMany-core ArchitectureDomain-specific AcceleratorParallel ProgrammingOver-the-air ComputationIntra-layer Model Parallelism

Abstract

Large deep learning models have shown great potential with state-of-the-art results in many tasks. However, running these large models is quite challenging on an accelerator (GPU or TPU) because the on-device memory is too limited for the size of these models. Intra-layer model parallelism is an approach to address the issues by partitioning individual layers or operators across multiple devices in a distributed accelerator cluster. But, the data communications generated by intra-layer model parallelism can contribute to a significant proportion of the overall execution time and severely hurt the computational efficiency.

References

	Year	Citations

Page 1