Publication | Closed Access
An Ensemble Machine-Learning Model To Predict Historical PM<sub>2.5</sub>Concentrations in China from Satellite Data
354
Citations
48
References
2018
Year
The long satellite aerosol data record enables assessments of historical PM<sub>2.5</sub> level in regions where routine PM<sub>2.5</sub> monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting PM<sub>2.5</sub> levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable PM<sub>2.5</sub> hindcast capabilities. The missing satellite data were first filled by multiple imputation. Then the modeling domain, China, was divided into seven regions using a spatial clustering method to control for unobserved spatial heterogeneity. A set of machine learning models including random forest, generalized additive model, and extreme gradient boosting were trained in each region separately. Finally, a generalized additive ensemble model was developed to combine predictions from different algorithms. The ensemble prediction characterized the spatiotemporal distribution of daily PM<sub>2.5</sub> well with the cross-validation (CV) R<sup>2</sup> (RMSE) of 0.79 (21 μg/m<sup>3</sup>). The cluster-based subregion models outperformed national models and improved the CV R<sup>2</sup> by ∼0.05. Compared with previous studies, our model provided more accurate out-of-range predictions at the daily level ( R<sup>2</sup> = 0.58, RMSE = 29 μg/m<sup>3</sup>) and monthly level ( R<sup>2</sup> = 0.76, RMSE = 16 μg/m<sup>3</sup>). Our hindcast modeling system allows for the construction of unbiased historical PM<sub>2.5</sub> levels.
| Year | Citations | |
|---|---|---|
Page 1
Page 1