Publication | Closed Access
From student hard drive to web corpus (part 1): the design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP)
136
Citations
9
References
2011
Year
EngineeringWriter IdentificationStudent Hard DriveCorpus LinguisticsText MiningMichigan StudentsNatural Language ProcessingLanguage DocumentationInformation RetrievalDocument EngineeringComputational LinguisticsDocument AnalysisDocument ClassificationOnline Submission ProcessLanguage StudiesContent AnalysisPart 1Knowledge DiscoveryAuthor ProfilingTerminology ExtractionMichigan CorpusTopic ModelLanguage CorpusLinguistics
In this paper, we provide a detailed account of the steps that were central to designing and compiling the Michigan Corpus of Upper-level Student Papers (MICUSP). MICUSP is a new collection of 829 papers (around 2.6 million words) written by University of Michigan students in their final undergraduate year or in their first three years of graduate education. The papers come from sixteen disciplines, ranging from Humanities and Arts to Physical Sciences, and represent a range of different text types. In this paper, we offer an overview of the design of MICUSP, the online submission process used to collect papers, and the text-type classification of the papers.
| Year | Citations | |
|---|---|---|
Page 1
Page 1