Population Definition, Sample Selection, and Calibration Procedures for Near Infrared Reflectance Spectroscopy

TLDR

Near‑infrared spectroscopy depends on selecting an appropriate sample population and optimal mathematical procedures to achieve accurate calibration. The study aimed to evaluate two algorithms, CENTER and SELECT, for defining the sample population and selecting calibration samples. The algorithms established population boundaries using standardized Mahalanobis distance and selected a small, structured set of samples, which were then used to compare modified partial least squares regression (MPLSR) and modified stepwise regression (MSR) on two diverse populations. A standardized H distance of 3.0 excluded outliers while a minimum distance of 0.6 between samples provided sufficient calibration data; both MPLSR and MSR yielded acceptable validation statistics, with MPLSR improving the standard error of performance by 18% over MSR.

Abstract

Near infrared spectroscopy relies heavily on the collection of an appropriate population of samples for calibration and the best mathematical procedure to obtain the most accurate calibration. The purpose of this study was to evaluate two algorithms (CENTER and SELECT) for defining the population and selecting samples for calibration. The selected samples were used to compare modified partial least squares regression (MPLSR) with modified stepwise regression (MSR) calibration method. The algorithms were developed to (i) establish the boundaries of a population of samples in terms of the standardized Mahalanobis distance ( H ) from the mean and (ii) select a small, structured set of samples for calibration using the standardized H distance between sample pairs. Two diverse populations of samples were used to test these approaches. Calibrations were performed using MPLSR and MSR. A standardized H distance of 3.0 from the mean was used as a boundary for excluding spectral outliers from a population, and a minimum standardized H distance between samples of 0.6 provided an adequate number of calibration samples for accurate predictions. Both regression methods provided acceptable validation statistics for crude protein, acid detergent fiber, and in vitro dry matter disappearance. The MPLSR calibration method gave an overall 18% improvement in standard error of performance (SEP) compared with the MSR calibration method.