Publication | Closed Access
Morphling
42
Citations
32
References
2021
Year
Unknown Venue
Cluster ComputingHyperparameter EstimationEngineeringMachine LearningBayesian OptimizationData ScienceMachine Learning ToolModel TuningParameter TuningModel DeploymentComputer EngineeringComputer ArchitectureProduction CloudMachine Learning ModelsParallel ProgrammingComputer ScienceParallel Computing
Machine learning models are widely deployed in production cloud to provide online inference services. Efficiently deploying inference services requires careful tuning of hardware and runtime configurations (e.g., GPU type, GPU memory, batch size), which can significantly improve the model serving performance and reduce cost. However, existing autoconfiguration approaches for general workloads, such as Bayesian optimization and white-box prediction, are inefficient in navigating the high-dimensional configuration space of model serving, incurring high sampling cost.
| Year | Citations | |
|---|---|---|
Page 1
Page 1