Publication | Closed Access
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective
578
Citations
5
References
2018
Year
Unknown Venue
EngineeringMachine LearningMachine Learning PipelinesMachine Learning ToolDistributed Data AnalyticsBig Data InfrastructureData ScienceHardware DesignDistributed Machine LearningEmbedded Machine LearningParallel ComputingMachine Learning ModelKnowledge DiscoveryComputer ScienceApplied Machine LearningSocial ComputingFederated LearningMassive Data ProcessingBig Data
Machine learning underpins many Facebook products, with diverse workloads, massive data pipelines, and intensive GPU/CPU demands that drive continuous infrastructure innovation. The paper aims to describe the hardware and software infrastructure that supports machine learning at global scale. The authors detail the hardware and software infrastructure enabling large‑scale machine learning at Facebook.
Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook's machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.
| Year | Citations | |
|---|---|---|
Page 1
Page 1