Publication | Open Access
Clustering mixed-type data using a probabilistic distance algorithm
13
Citations
19
References
2022
Year
Cluster ComputingEngineeringSymbolic Data AnalysisCluster AnalysisCombinatorial Data AnalysisUnsupervised Machine LearningOptimization-based Data MiningData ScienceData MiningPattern RecognitionMixed-type DataHomogeneous UnitsProbabilistic Distance ClusteringStatisticsDocument ClusteringData ModelingKnowledge DiscoveryFunctional Data AnalysisFuzzy ClusteringBig Data
Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.
| Year | Citations | |
|---|---|---|
Page 1
Page 1