Comparing Dissimilarity Measures For Probabilistic Symbolic Objects

Abstract

Abstract Symbolic data analysis generalizes some standard statistical data miningmethods, such as those developed for classification and clustering tasks, to thecase of symbolic objects (S0s). These objects, informally defmed as “aggregateddata” because they synthesize information concerning a group of individuals of apopulation, ensure confidentiality of original data, nevertheless they pose newproblems which finds a solution in symbolic data analysis. A by-product ofworking with aggregate data is the possibility of dealing with data fi-omcomplexquestionnaires, where multiple answers are possible or constraints amongdifferent answers exists. Comparing SOS is an important step of symbolic dataanalysis. It can be usefid either to cluster some SOS or to discriminate betweenthem, or even to order SOS according to their degree of generalization. Thispaper presents a comparative study aiming at evaluating the degree of dissimilarity between the objects ofa restricted class symbolic data, namelyProbabilistic Symbolic Objects. To define a ground truth for the empiricalevaluation, a data set with understandable and explainable properties has beenselected. In the experiment, only two dissimilarity measures, among the sevenones we have studied. seems to have a more stable behaviour.

References

Page 1

	Year	Citations

Page 1