Publication | Open Access
Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data
301
Citations
37
References
2019
Year
Normalization MethodsEngineeringData ScienceCardiovascular DiseaseBiomedical Data ScienceComputational BiologyData NormalizationBiostatisticsOmicsBiomedical AnalysisMetabolomicsMedicineSystematic Error RemovalStatisticsRandom Forest
Large‑scale untargeted lipidomics experiments measure hundreds to thousands of samples over days or weeks, during which batch effects, longitudinal drifts, and instrument variation introduce systematic errors that mask true biological signals. To address this, the authors propose a novel QC‑based normalization strategy that removes systematic variation. The strategy, named SERRF, applies a random‑forest model to QC samples and was benchmarked against 15 other normalization methods across six datasets from three large cohorts totaling 3,110 samples. SERRF lowered average technical error to 5 % RSD, outperforming all comparators and uncovering biologically relevant variance.
Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.
| Year | Citations | |
|---|---|---|
Page 1
Page 1