Scalable Semisupervised GMM for Big Data Quality Prediction in Multimode Processes

Abstract

In this paper, a novel variational inference semisupervised Gaussian mixture model (VI-S 2 GMM) model is first proposed for semisupervised predictive modeling in multimode processes. Parameters of Gaussian components are identified more accurately with extra unlabeled samples, which improve the prediction performance of the regression model. Since all labeled and unlabeled data samples are involved in each iteration of parameter updating, intractable computing problems occur when facing high-dimension datasets. To tackle this problem, a scalable stochastic VI-S 2 GMM (SVI-S 2 GMM) is further proposed. Through taking advantage of a stochastic gradient optimization algorithm to maximize the evidence of lower bound, the VI-based algorithm becomes scalable. In the SVI-S 2 GMM, only one or a minibatch of samples is randomly selected to update parameters in each iteration, which is more efficient than the VI-S 2 GMM. Since the whole dataset is divided and transferred to iterations batch by batch, the scalable SVI-S 2 GMM algorithm can easily handle the big data modeling issue. In this way, a large number of unlabeled data can be useful in the modeling, which will further benefit the prediction performance. The SVI-S 2 GMM is then exploited for the prediction of a quality-related key performance index. Two examples demonstrate the feasibility and effectiveness of the proposed algorithms.

References

Page 1

	Year	Citations

Page 1