Publication | Closed Access
TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning
14
Citations
4
References
2020
Year
Unknown Venue
Switch Programming LanguageDistributed Deep LearningEngineeringHigh Performance Computer NetworkNetwork Traffic ControlCloud ComputingComputer EngineeringComputer ScienceDistributed LearningParallel ComputingDeep LearningDistributed ModelAdvanced NetworkingTensor Packet
TensorExpress provides in-network communication scheduling for distributed deep learning (DDL). In cloud-based DDL, parameter communication over a network is a key bottleneck. Previous studies proposed tensor packet reordering approaches to reduce network blocking time. However, network contention still exists in DDL. TensorExpress mitigates network contention and reduces overall training time. It schedules tensor packets in-network using P4, a switch programming language. TensorExpress improves latency and network blocking time up to 2.5 and 2.44 times, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1