Publication | Closed Access
Scalable Semisupervised GMM for Big Data Quality Prediction in Multimode Processes
87
Citations
48
References
2018
Year
EngineeringMachine LearningUnsupervised Machine LearningBig Data ModelData ScienceData MiningMixture AnalysisRegression ModelBayesian MethodsMultimode ProcessesParameter UpdatingStatisticsPredictive AnalyticsGaussian AnalysisComputer ScienceBig Data AcquisitionRobust ModelingGaussian ProcessGaussian Mixture ModelStatistical InferenceMassive Data ProcessingBig Data
In this paper, a novel variational inference semisupervised Gaussian mixture model (VI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM) model is first proposed for semisupervised predictive modeling in multimode processes. Parameters of Gaussian components are identified more accurately with extra unlabeled samples, which improve the prediction performance of the regression model. Since all labeled and unlabeled data samples are involved in each iteration of parameter updating, intractable computing problems occur when facing high-dimension datasets. To tackle this problem, a scalable stochastic VI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM (SVI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM) is further proposed. Through taking advantage of a stochastic gradient optimization algorithm to maximize the evidence of lower bound, the VI-based algorithm becomes scalable. In the SVI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM, only one or a minibatch of samples is randomly selected to update parameters in each iteration, which is more efficient than the VI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM. Since the whole dataset is divided and transferred to iterations batch by batch, the scalable SVI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM algorithm can easily handle the big data modeling issue. In this way, a large number of unlabeled data can be useful in the modeling, which will further benefit the prediction performance. The SVI-S <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> GMM is then exploited for the prediction of a quality-related key performance index. Two examples demonstrate the feasibility and effectiveness of the proposed algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1