Publication | Closed Access
A Case for Managed and Model-less Inference Serving
30
Citations
32
References
2019
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningMachine Learning ToolMachine Learning ModelsInductive InferenceBayesian InferenceStatistical Relational LearningData ScienceEmbedded Machine LearningStatisticsMachine Learning ModelPredictive AnalyticsKnowledge DiscoveryComputer EngineeringFacebook ApplicationsComputer ScienceDeep LearningNeural Architecture SearchAutomated ReasoningParameter TuningStatistical InferenceModel-less InferenceInference Queries
The number of applications relying on inference from machine learning models, especially neural networks, is already large and expected to keep growing. For instance, Facebook applications issue tens-of-trillions of inference queries per day with varying performance, accuracy, and cost constraints. Unfortunately, today's inference serving systems are neither easy to use nor cost effective. Developers must manually match the performance, accuracy, and cost constraints of their applications to a large design space that includes decisions such as selecting the right model and model optimizations, selecting the right hardware architecture, selecting the right scale-out factor, and avoiding cold-start effects. These interacting decisions are difficult to make, especially when the application load varies over time, applications evolve over time, and the available resources vary over time.
| Year | Citations | |
|---|---|---|
Page 1
Page 1