Concepedia

TLDR

clValid is an R package that provides functions for validating clustering results using internal, stability, and biological measures, including automatic use of Gene Ontology data. The package enables users to evaluate multiple clustering algorithms and varying cluster numbers simultaneously, including a self‑organizing tree algorithm, to identify the most suitable method and cluster count. Users can select from nine clustering algorithms, combine any validation measures in a single function call, and obtain a clValid S4 object with summary, plot, and print methods to display optimal scores and extract results.

Abstract

The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

References

YearCitations

Page 1