Concepedia

Publication | Open Access

PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data

502

Citations

44

References

2019

Year

TLDR

Fine particulate matter (PM2.5) pollution is a major public health issue linked to cancer, cardiovascular, respiratory, and metabolic diseases, and accurate prediction can aid risk warning, yet the influencing factors remain underexplored. This study investigates feature importance for PM2.5 prediction in Tehran by comparing random forest, XGBoost, and deep learning models. The models use 23 features comprising satellite, meteorological, ground‑measured PM2.5, and geographic data. XGBoost achieved the best performance (R² = 0.81, MAE = 9.93 µg/m³, RMSE = 13.58 µg/m³), and all three methods performed similarly, with R² ranging 0.63–0.67 when 3 km AOD was included and 0.77–0.81 when excluded, while satellite AOD did not improve accuracy.

Abstract

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM2.5) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM2.5 concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM2.5 concentrations, the factors influencing PM2.5 prediction have not been investigated. In this work, we study feature importance for PM2.5 prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM2.5, and geographical data, in the modeling. The best model performance obtained was R2 = 0.81 (R = 0.9), MAE = 9.93 µg/m3, and RMSE = 13.58 µg/m3 using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R2 varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM2.5 lag data, satellite-derived AODs did not improve model performance.

References

YearCitations

Page 1