Publication | Open Access
PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
502
Citations
44
References
2019
Year
Environmental MonitoringMachine LearningEngineeringAir QualityEarth ScienceSocial SciencesPollution DetectionData SciencePm2.5 PredictionAir Quality MonitoringFine Particulate MatterMachine Learning ModelPredictive AnalyticsForecastingDeep LearningRemote SensingExtreme Gradient BoostingAir Quality PredictionAir PollutionRandom ForestEnsemble Algorithm
Fine particulate matter (PM2.5) pollution is a major public health issue linked to cancer, cardiovascular, respiratory, and metabolic diseases, and accurate prediction can aid risk warning, yet the influencing factors remain underexplored. This study investigates feature importance for PM2.5 prediction in Tehran by comparing random forest, XGBoost, and deep learning models. The models use 23 features comprising satellite, meteorological, ground‑measured PM2.5, and geographic data. XGBoost achieved the best performance (R² = 0.81, MAE = 9.93 µg/m³, RMSE = 13.58 µg/m³), and all three methods performed similarly, with R² ranging 0.63–0.67 when 3 km AOD was included and 0.77–0.81 when excluded, while satellite AOD did not improve accuracy.
In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM2.5) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM2.5 concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM2.5 concentrations, the factors influencing PM2.5 prediction have not been investigated. In this work, we study feature importance for PM2.5 prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM2.5, and geographical data, in the modeling. The best model performance obtained was R2 = 0.81 (R = 0.9), MAE = 9.93 µg/m3, and RMSE = 13.58 µg/m3 using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R2 varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM2.5 lag data, satellite-derived AODs did not improve model performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1