Publication | Closed Access
Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization
11
Citations
38
References
2018
Year
Mathematical ProgrammingArtificial IntelligenceLarge-scale Global OptimizationEngineeringMachine LearningComputational ComplexityHyperparameter EstimationData ScienceData MiningParameterized AlgorithmStatistical Machine LearningSimsql Database SystemData OptimizationLarge Scale OptimizationComputer ScienceUser-defined FunctionsLarge-scale Machine LearningLarge-scale OptimizationAdaptive OptimizationDeclarative ParameterizationsTheory Of ComputingModel OptimizationParameter Tuning
Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function g is parameterized with a data object x as well as the subset of the statistical model Θ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</sub> that applies to that object, so that g(x|Θ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</sub> ) can be used to compute a partial update of the model. This is repeated for every x in the full data set X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.
| Year | Citations | |
|---|---|---|
Page 1
Page 1