Tane: An Efficient Algorithm for Discovering Functional and Approximate Dependencies

TLDR

The discovery of functional dependencies from relations is an important database analysis technique. We present Tane, an efficient algorithm for finding functional dependencies from large databases. Tane partitions rows by attribute values, enabling fast validity testing and efficient discovery of approximate functional dependencies while easily identifying erroneous rows. Experiments show that Tane is fast in practice, improving running times by several orders of magnitude over previous methods and scaling to much larger datasets.

Abstract

The discovery of functional dependencies from relations is an important database analysis technique. We present Tane, an efficient algorithm for finding functional dependencies from large databases. Tane is based on partitioning the set of rows with respect to their attribute values, which makes testing the validity of functional dependencies fast even for a large number of tuples. The use of partitions also makes the discovery of approximate functional dependencies easy and efficient and the erroneous or exceptional rows can be identified easily. Experiments show that Tane is fast in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods.

References

Page 1

	Year	Citations

Page 1