VisReduce: Fast and responsive incremental information visualization of large datasets

Abstract

Performance and responsiveness of visual analytics sytems for exploratory data analysis of large datasets has been a long standing problem. We propose a method for incrementally computing visualizations in a distributed fashion by combining a modified MapReduce-style algorithm with a compressed columnar data store, resulting in significant improvements in performance and responsiveness for constructing commonly encountered information visualizations, e.g. bar charts, scatterplots, heat maps, cartograms and parallel coordinate plots. We compare our method with one that queries three other readily available database and data warehouse systems — PostgreSQL, Cloudera Impala and the MapReduce-based Apache Hive — in order to build visualizations. We show that our end-to-end approach allows for greater speed and guaranteed end-user responsiveness, even in the face of large, long-running queries.

References

Page 1

	Year	Citations

Page 1