Publication | Open Access
Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data.
1.1K
Citations
108
References
2016
Year
The Pearson product–moment correlation coefficient (<i>r<sub>p</sub></i>) and the Spearman rank correlation coefficient (<i>r<sub>s</sub></i>) are widely used in psychological research. We compare <i>r<sub>p</sub></i> and <i>r<sub>s</sub></i> on 3 criteria: variability, bias with respect to the population value, and robustness to an outlier. Using simulations across low (N = 5) to high (N = 1,000) sample sizes we show that, for normally distributed variables, <i>r<sub>p</sub></i> and <i>r<sub>s</sub></i> have similar expected values but <i>r<sub>s</sub></i> is more variable, especially when the correlation is strong. However, when the variables have high kurtosis,<i> r<sub>p</sub></i> is more variable than <i>r<sub>s</sub></i>. Next, we conducted a sampling study of a psychometric dataset featuring symmetrically distributed data with light tails, and of 2 Likert-type survey datasets, 1 with light-tailed and the other with heavy-tailed distributions. Consistent with the simulations, <i>r<sub>p</sub></i> had lower variability than <i>r<sub>s</sub></i> in the psychometric dataset. In the survey datasets with heavy-tailed variables in particular, <i>r<sub>s</sub></i> had lower variability than <i>r<sub>p</sub></i>, and often corresponded more accurately to the population Pearson correlation coefficient (<i>R<sub>p</sub></i>) than <i>r<sub>p</sub></i> did. The simulations and the sampling studies showed that variability in terms of standard deviations can be reduced by about 20% by choosing <i>r<sub>s</sub></i> instead of <i>r<sub>p</sub></i>. In comparison, increasing the sample size by a factor of 2 results in a 41% reduction of the standard deviations of <i>r<sub>s</sub></i> and <i>r<sub>p</sub></i>. In conclusion, <i>r<sub>p</sub></i> is suitable for light-tailed distributions, whereas <i>r<sub>s</sub></i> is preferable when variables feature heavy-tailed distributions or when outliers are present, as is often the case in psychological research.
| Year | Citations | |
|---|---|---|
Page 1
Page 1