Publication | Open Access
Mapping Population Distribution Based on XGBoost Using Multisource Data
43
Citations
63
References
2021
Year
Environmental MonitoringEngineeringBuilding DataSocial SciencesGeospatial MappingData ScienceBig DataMultiple Classifier SystemStatisticsGeographyUrban EcologyPopulation StudyLand Cover MapPopulation DistributionData DistributionRemote SensingExtreme Gradient BoostingRandom ForestEnsemble Algorithm
Mapping fine-scale distribution of the population is essential to the study of human activities, where more reliable open-access big data could be excavated with the help of machine learning models. However, the combination of multi-source datasets and multi-dimensional features for population estimation was still unclear, and different models also needed comparison. Thus, in this study, related features from multi-source data were first extracted, including building data, geographic big data, remote sensing data, and basic geographic data. Then, the effective indicators with higher contribution weight were selected from multi-source data, which can reduce the noise and unstable model fitting. Finally, the population distribution map for 100-meter grid was obtained in Shenzhen in 2019, and estimation results for five tree-based ensemble learning models were compared at community scale, including random forest (RF), gradient boosted decision tree (GBDT), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost). Our results showed that: (1) Building data and geographic big data could better reflect the spatial heterogeneity of the population; (2) Indicators selection could effectively improve the estimation accuracy of the population mapping; (3) Compared with other models, XGBoost had the largest R2 (80%), the smallest RMSE and MAE, the most percentage of accurate-estimation communities (-0.3<RE<0.3, 65%), and a shorter train time. Therefore, XGBoost was chosen for mapping population distribution instead of GBDT, LightGBM, CatBoost and RF. Our proposed method for population mapping can help to optimize the allocation of resources and guide a more scientific path for urban development.
| Year | Citations | |
|---|---|---|
Page 1
Page 1