Publication | Closed Access
Cloud computing data capsules for non-consumptiveuse of texts
34
Citations
8
References
2014
Year
Unknown Venue
Digital Data SourcesEngineeringCloud Computing ArchitectureInformation ForensicsCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningManagementData IntegrationCloud Data ManagementContent AnalysisData ManagementMachine TranslationUnstructured DataKnowledge DiscoveryComputer ScienceInformation ManagementData CapsulesInformation ExtractionContent Similarity DetectionCloud ComputingText ProcessingDocument Processing
As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide.
| Year | Citations | |
|---|---|---|
Page 1
Page 1