Concepedia

Publication | Closed Access

DiskReduce

139

Citations

19

References

2009

Year

Abstract

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. DiskReduce is a modification of the Hadoop distributed file system (HDFS) enabling asynchronous compression of initially triplicated data down to RAID-class redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, DiskReduce can delay encoding long enough to deliver the performance benefits of multiple data copies.

References

YearCitations

2008

18.4K

2003

5K

1960

3.5K

1988

2.3K

1995

690

2004

489

1996

394

2008

324

2009

227

2005

188

Page 1