Publication | Open Access
Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems
48
Citations
36
References
2023
Year
Unknown Venue
Optical MaterialsEngineeringComputer ArchitectureEducationFiber OpticsProgrammable PhotonicsOptical ComputingOptical NetworksOptical PropertiesComputing SystemsOptical SwitchingPhotonic Integrated CircuitOptical SystemsOptical CommunicationParallel ComputingPerformance ImprovementNon-terrestrial Optical NetworksOptical NetworkingOptical FiberFree-space Optical NetworkPhotonicsElectrical EngineeringOptical InterconnectsLightwave FabricsComputer EngineeringComputer ScienceLightwave FabricTechnologyMachine Learning SystemsOptoelectronics
We describe our experience developing what we believe to be the world's first large-scale production deployments of lightwave fabrics used for both datacenter networking and machine-learning (ML) applications. Using optical circuit switches (OCSes) and optical transceivers developed in-house, we employ hardware and software codesign to integrate the fabrics into our network and computing infrastructure. Key to our design is a high degree of multiplexing enabled by new kinds of wavelength-division-multiplexing (WDM) and optical circulators that support high-bandwidth bidirectional traffic on a single strand of optical fiber. The development of the requisite OCS and optical transceiver technologies leads to a synchronous lightwave fabric that is reconfigurable, low latency, rate agnostic, and highly available. These fabrics have provided substantial benefits for long-lived traffic patterns in our datacenter networks and predictable traffic patterns in tightly-coupled machine learning clusters. We report results for a large-scale ML superpod with 4096 tensor processing unit (TPU) V4 chips that has more than one ExaFLOP of computing power. For this use case, the deployment of a lightwave fabric provides up to 3× better system availability and model-dependent performance improvements of up to 3.3× compared to a static fabric, despite constituting less than 6% of the total system cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1