Publication | Open Access
Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models
47
Citations
10
References
2022
Year
Unknown Venue
Artificial IntelligenceCluster ComputingEngineeringMachine LearningComputer ArchitectureCommunication ComplexityDependent ComputationData ScienceApproximate ComputingSparse Neural NetworkMulti-task LearningParallel ComputingMassively-parallel ComputingOverlap CommunicationComputer EngineeringComputer ScienceDeep LearningGpu ClusterLarge ModelsHardware AccelerationDistributed Accelerator ClusterMany-core ArchitectureDomain-specific AcceleratorParallel ProgrammingOver-the-air ComputationIntra-layer Model Parallelism
Large deep learning models have shown great potential with state-of-the-art results in many tasks. However, running these large models is quite challenging on an accelerator (GPU or TPU) because the on-device memory is too limited for the size of these models. Intra-layer model parallelism is an approach to address the issues by partitioning individual layers or operators across multiple devices in a distributed accelerator cluster. But, the data communications generated by intra-layer model parallelism can contribute to a significant proportion of the overall execution time and severely hurt the computational efficiency.
| Year | Citations | |
|---|---|---|
Page 1
Page 1