Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence

TLDR

Interest in using machine learning and neuroimaging to detect psychosis at the individual level is high, but the reliability of existing findings is unclear due to potential methodological issues that may have inflated the literature. The study aimed to determine how well machine learning applied to neuroanatomical data can detect first‑episode psychosis while implementing methodological safeguards against overoptimistic results. The authors tested traditional machine learning and deep learning on three feature sets—surface‑based regional volumes and cortical thickness, voxel‑based gray‑matter volume, and voxel‑based cortical thickness—across 956 participants in five independent datasets, using nested and cross‑site cross‑validation to assess reliability. Accuracies ranged from 50% to 70% for surface‑based features, 50% to 63% for gray‑matter volume, and 51% to 68.

Abstract

Abstract Despite the high level of interest in the use of machine learning (ML) and neuroimaging to detect psychosis at the individual level, the reliability of the findings is unclear due to potential methodological issues that may have inflated the existing literature. This study aimed to elucidate the extent to which the application of ML to neuroanatomical data allows detection of first episode psychosis (FEP), while putting in place methodological precautions to avoid overoptimistic results. We tested both traditional ML and an emerging approach known as deep learning (DL) using 3 feature sets of interest: (1) surface-based regional volumes and cortical thickness, (2) voxel-based gray matter volume (GMV) and (3) voxel-based cortical thickness (VBCT). To assess the reliability of the findings, we repeated all analyses in 5 independent datasets, totaling 956 participants (514 FEP and 444 within-site matched controls). The performance was assessed via nested cross-validation (CV) and cross-site CV. Accuracies ranged from 50% to 70% for surfaced-based features; from 50% to 63% for GMV; and from 51% to 68% for VBCT. The best accuracies (70%) were achieved when DL was applied to surface-based features; however, these models generalized poorly to other sites. Findings from this study suggest that, when methodological precautions are adopted to avoid overoptimistic results, detection of individuals in the early stages of psychosis is more challenging than originally thought. In light of this, we argue that the current evidence for the diagnostic value of ML and structural neuroimaging should be reconsidered toward a more cautious interpretation.

References

Page 1

	Year	Citations

Page 1