Publication | Open Access
Head/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution
367
Citations
24
References
2012
Year
Heavy‑tailed distributions are right‑skewed with a minority of large values and a majority of small values, often following power‑law, lognormal, or exponential patterns and referred to as scaling or hierarchy. The paper proposes a new classification scheme, head/tail breaks, to identify groupings or hierarchies in data with heavy‑tailed distributions. The method partitions all values around the mean into head and tail, then recursively applies the same partition to the head until the head no longer exhibits a heavy‑tailed pattern. This approach yields a naturally determined number of classes and intervals, and the authors show it outperforms Jenks’ natural breaks in revealing the underlying hierarchy. Keywords: data classification, head/tail division rule, natural breaks, scaling, hierarchy.
This paper introduces a new classification scheme - head/tail breaks - in order to find groupings or hierarchy for data with a heavy-tailed distribution. The heavy-tailed distributions are heavily right skewed, with a minority of large values in the head and a majority of small values in the tail, commonly characterized by a power law, a lognormal or an exponential function. For example, a country's population is often distributed in such a heavy-tailed manner, with a minority of people (e.g., 20 percent) in the countryside and the vast majority (e.g., 80 percent) in urban areas. This heavy-tailed distribution is also called scaling, hierarchy or scaling hierarchy. This new classification scheme partitions all of the data values around the mean into two parts and continues the process iteratively for the values (above the mean) in the head until the head part values are no longer heavy-tailed distributed. Thus, the number of classes and the class intervals are both naturally determined. We therefore claim that the new classification scheme is more natural than the natural breaks in finding the groupings or hierarchy for data with a heavy-tailed distribution. We demonstrate the advantages of the head/tail breaks method over Jenks' natural breaks in capturing the underlying hierarchy of the data. Keywords: data classification, head/tail division rule, natural breaks, scaling, and hierarchy
| Year | Citations | |
|---|---|---|
Page 1
Page 1