Predictive ability of machine learning methods for massive crop yield prediction

TLDR

Accurate crop yield estimation is essential for agricultural planning, yet existing studies compare few machine‑learning methods across limited crop types, leaving a knowledge gap. This study compares the predictive accuracy of machine‑learning and linear‑regression techniques across ten crop datasets. Models—including multiple linear regression, M5‑Prime regression trees, multilayer perceptrons, support‑vector regression, and k‑nearest neighbors—were trained on Mexican irrigation zone data and evaluated with four error metrics (RMSE, RRSE, MAE, R) on two consecutive years. M5‑Prime and k‑nearest neighbors achieved the lowest errors (RMSE 5.14/4.91, RRSE 79.46%/79.78%, MAE 18.12%/19.42%) and highest correlations (0.41/0.42), making M5‑Prime the most suitable tool for large‑scale crop yield prediction.

Abstract

An important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML) is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS), root relative square error (RRSE), normalized mean absolute error (MAE), and correlation factor (R). Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5-Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91), the lowest RRSE errors (79.46% and 79.78%), the lowest average MAE errors (18.12% and 19.42%), and the highest average correlation factors (0.41 and 0.42). Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning.

References

Page 1

	Year	Citations

Page 1