Data-intensive text processing with MapReduce

Concepedia

Publication | Closed Access

DOI

450

Citations

134

References

2009

Year

Jimmy Lin, Chris Dyer

Unknown Venue

Cluster ComputingEngineeringMap-reduceText MiningInformation RetrievalData ScienceHadoop ProgramsData-intensive PlatformData IntegrationParallel ComputingData ManagementHadoop ClustersData-intensive Text ProcessingKnowledge DiscoveryComputer ScienceData-intensive ComputingCloud ComputingParallel ProgrammingMassive Data ProcessingBig Data

Abstract

This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model [1], using the open-source Hadoop implementation. The focus will be on scalability and the tradeoffs associated with distributed processing of large datasets. Content will include general discussions about algorithm design, presentation of illustrative algorithms, case studies in HLT applications, as well as practical advice in writing Hadoop programs and running Hadoop clusters.

References

Page 1

	Year	Citations

Page 1