Publication | Open Access
Data Management Challenges in Production Machine Learning
199
Citations
18
References
2017
Year
Unknown Venue
EngineeringMachine LearningData Management IssueMachine Learning ToolDatabase LiteratureData ScienceData MiningDatabase ProcessingManagementData IntegrationData Pre-processingData Management ChallengesData ManagementOpen Research QuestionsData ModelingEngineering Data ManagementKnowledge DiscoverySuch Largescale PipelinesComputer ScienceData EngineeringModel ProductionData TreatmentIndustrial InformaticsBig Data
The tutorial discusses data-management issues that arise in the context of machine learning pipelines deployed in production. Informed by our own experience with such largescale pipelines, we focus on issues related to understanding, validating, cleaning, and enriching training data. The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art.
| Year | Citations | |
|---|---|---|
Page 1
Page 1