Concepedia

Publication | Open Access

Introduction to machine learning: k-nearest neighbors

1.2K

Citations

7

References

2016

Year

TLDR

Machine learning is widely used across science, yet its adoption in medical literature remains limited due to technical challenges; k‑nearest neighbors (kNN) is a simple algorithm whose performance—often measured by average accuracy—is heavily influenced by the choice of k, distance metric, and predictors. The article aims to introduce the fundamentals of the kNN algorithm and demonstrate how to implement kNN modeling in R. Implementation involves preparing the dataset, applying the knn() function in R, and then evaluating the model’s diagnostic performance.

Abstract

Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance.

References

YearCitations

Page 1