Estimation of Entropy and Mutual Information

TLDR

The paper presents new results on nonparametric entropy and mutual information estimation and proves that misapplying common techniques can yield arbitrarily poor estimates even with unlimited data. The authors use an exact local expansion of the entropy function, a sieve-based framework, and approximation theory to establish consistency, central limit theorems, and optimal bias bounds for common discretized estimators. The study shows that common estimators can be severely biased and that confidence intervals underestimate error, provides a more accurate bias approximation, introduces a new estimator with tight error bounds, and demonstrates its effectiveness on real and simulated data.

Abstract

We present some new results on the nonparametric estimation of entropy and mutual information. First, we use an exact local expansion of the entropy function to prove almost sure consistency and central limit theorems for three of the most commonly used discretized information estimators. The setup is related to Grenander's method of sieves and places no assumptions on the underlying probability measure generating the data. Second, we prove a converse to these consistency theorems, demonstrating that a misapplication of the most common estimation techniques leads to an arbitrarily poor estimate of the true information, even given unlimited data. This “inconsistency” theorem leads to an analytical approximation of the bias, valid in surprisingly small sample regimes and more accurate than the usual [Formula: see text] formula of Miller and Madow over a large region of parameter space. The two most practical implications of these results are negative: (1) information estimates in a certain data regime are likely contaminated by bias, even if “bias-corrected” estimators are used, and (2) confidence intervals calculated by standard techniques drastically underestimate the error of the most common estimation methods. Finally, we note a very useful connection between the bias of entropy estimators and a certain polynomial approximation problem. By casting bias calculation problems in this approximation theory framework, we obtain the best possible generalization of known asymptotic bias results. More interesting, this framework leads to an estimator with some nice properties: the estimator comes equipped with rigorous bounds on the maximum error over all possible underlying probability distributions, and this maximum error turns out to be surprisingly small. We demonstrate the application of this new estimator on both real and simulated data.

References

Page 1

	Year	Citations

Page 1