Publication | Closed Access
Interference-Aware Scheduling for Inference Serving
32
Citations
14
References
2021
Year
Unknown Venue
Cluster ComputingEngineeringMachine LearningLatency DegradationMachine Learning ToolData ScienceEmbedded Machine LearningInference ApplicationsDistributed ModelPredictive AnalyticsModel DeploymentScheduling (Computing)Computer ScienceMobile ComputingInterference-aware SchedulingScheduling AnalysisScheduling ProblemAutomated ReasoningEdge ComputingLatency RequirementsCloud ComputingBig Data
Machine learning inference applications have proliferated through diverse domains such as healthcare, security, and analytics. Recent work has proposed inference serving systems for improving the deployment and scalability of models. To improve resource utilization, multiple models can be co-located on the same backend machine. However, co-location can cause latency degradation due to interference and can subsequently violate latency requirements. Although interference-aware schedulers for general workloads have been introduced, they do not scale appropriately to heterogeneous inference serving systems where the number of co-location configurations grows exponentially with the number of models and machine types.
| Year | Citations | |
|---|---|---|
Page 1
Page 1