Publication | Closed Access
CCDB: A Corpus-Linguistic Research & Development Workbench
10
Citations
0
References
2007
Year
Unknown Venue
Ccdb –EngineeringSpeech CorpusCorpus LinguisticsLanguage ProcessingText MiningApplied LinguisticsNatural Language ProcessingSyntaxInformation RetrievalData ScienceComputational LinguisticsLanguage EngineeringGrammarCorpus AnalysisLanguage StudiesLearner Corpus LinguisticsComputational LexicologyTerminology ExtractionEmpirical Baseline FrameworkCollocation ProfilesLexical ResourceDevelopment WorkbenchLanguage CorpusLinguistics
Within a strictly corpus-driven paradigm, an in-depth profiling of many linguistic phenomena requires fast access to massive amounts of data derived from very large corpora. This poster presentation describes an empirical baseline framework – the CCDB – established for this purpose in 2001 at the Institute for the German Language (IDS) in Mannheim. We use the CCDB for the study, development, and evaluation of methods for the data-driven exploration and modelling of language use. The CCDB can be accessed through a public web interface at the URL http://corpora.ids-mannheim.de/ccdb/ . The paper is structured as follows: We first describe the kind of data that the framework provides. Then, we briefly discuss the notion of similarity of collocation profiles. Finally, we give examples of specific CCDB-based methods that we have been recently working on.