Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013–2015, using a spatiotemporal land-use random-forest model

TLDR

Particulate matter air pollution is a leading global cause of death, with both short‑term and long‑term health effects, yet most epidemiological studies focus on cities because reliable spatiotemporal exposure estimates are lacking in nonurban areas. This study aims to estimate daily PM10, PM2.5, and PM2.5–10 concentrations across Italy on a 1‑km² grid for 2013–2015 using a Random Forest machine‑learning approach. Separate Random Forest models were built in five stages: predicting PM2.5 and PM2.5–10 where only PM10 data existed; imputing missing satellite aerosol optical depth with ensemble model estimates; linking measured PM to satellite, land‑use, and meteorological variables; applying the model to every 1‑km² grid cell; and refining predictions with small‑scale predictors at monitor sites or within buffers. The models explained most PM variability, achieving cross‑validation R² of 0.75–0.80 in stage 3 and 0.84–0.86 in stage 5 for PM10 and PM2.5, though performance for PM2.5–10 was lower in summer and southern Italy, yet predictions reliably captured annual and daily variability for health‑effects research.

Abstract

Particulate matter (PM) air pollution is one of the major causes of death worldwide, with demonstrated adverse effects from both short-term and long-term exposure. Most of the epidemiological studies have been conducted in cities because of the lack of reliable spatiotemporal estimates of particles exposure in nonurban settings. The objective of this study is to estimate daily PM10 (PM < 10 μm), fine (PM < 2.5 μm, PM2.5) and coarse particles (PM between 2.5 and 10 μm, PM2.5–10) at 1-km2 grid for 2013–2015 using a machine learning approach, the Random Forest (RF). Separate RF models were defined to: predict PM2.5 and PM2.5–10 concentrations in monitors where only PM10 data were available (stage 1); impute missing satellite Aerosol Optical Depth (AOD) data using estimates from atmospheric ensemble models (stage 2); establish a relationship between measured PM and satellite, land use and meteorological parameters (stage 3); predict stage 3 model over each 1-km2 grid cell of Italy (stage 4); and improve stage 3 predictions by using small-scale predictors computed at the monitor locations or within a small buffer (stage 5). Our models were able to capture most of PM variability, with mean cross-validation (CV) R2 of 0.75 and 0.80 (stage 3) and 0.84 and 0.86 (stage 5) for PM10 and PM2.5, respectively. Model fitting was less optimal for PM2.5–10, in summer months and in southern Italy. Finally, predictions were equally good in capturing annual and daily PM variability, therefore they can be used as reliable exposure estimates for investigating long-term and short-term health effects.

References

Page 1

	Year	Citations

Page 1