Publication | Open Access
Annotating German Clinical Documents for De-Identification
14
Citations
0
References
2019
Year
EngineeringDiagnosisDisease ClassificationCorpus LinguisticsText MiningNatural Language ProcessingComputational LinguisticsDocument ClassificationMedical HistoryBiomedical Text MiningMachine TranslationHealth InformaticsAnnotation GuidelinesInformation ExtractionClinical DataAnnotation TeamDischarge SummariesPatient SafetyMedicineClinical DatabaseEmergency Medicine
We devised annotation guidelines for the de-identification of German clinical documents and assembled a corpus of 1,106 discharge summaries and transfer letters with 44K annotated protected health information (PHI) items. After three iteration rounds, our annotation team finally reached an inter-annotator agreement of 0.96 on the instance level and 0.97 on the token level of annotation (averaged pair-wise F1 score). To establish a baseline for automatic de-identification on our corpus, we trained a recurrent neural network (RNN) and achieved F1 scores greater than 0.9 on most major PHI categories.