Publication | Open Access
SiP-ML
80
Citations
91
References
2021
Year
Unknown Venue
PhotonicsOptical Network InterconnectsTask PartitioningEngineeringOptical InterconnectsEdge ComputingComputer EngineeringComputer ArchitectureInterconnection NetworkComputer ScienceInterconnection Network ArchitectureParallel ComputingDeep LearningProgrammable PhotonicsOptical NetworkingSilicon PhotonicsOptical Computing
This paper proposes optical network interconnects as a key enabler for building high-bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML, accelerates the training time of popular DNN models using silicon photonics links capable of providing multiple terabits-per-second of bandwidth per GPU. SiP-ML partitions the training job across GPUs with hybrid data and model parallelism while ensuring the communication pattern can be supported efficiently on the network interconnect. We develop task partitioning and device placement methods that take the degree and reconfiguration latency of optical interconnects into account. Simulations using real DNN models show that, compared to the state-of-the-art electrical networks, our approach improves training time by 1.3--9.1x.
| Year | Citations | |
|---|---|---|
Page 1
Page 1