Publication | Open Access
ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision
23
Citations
31
References
2021
Year
Scientific literature analysis needs fine-grained named entity recognition (NER) to provide a wide range of information for scientific discovery. For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts. On the other hand, domain-specific ontologies and knowledge bases (KBs) can be easily accessed, constructed, or integrated, which makes distant supervision realistic for fine-grained chemistry NER. In distant supervision, training labels are generated by matching mentions in a document with the concepts in the knowledge bases (KBs). However, this kind of KB-matching suffers from two major challenges: incomplete annotation and noisy annotation. We propose CHEMNER, an ontologyguided, distantly-supervised method for finegrained chemistry NER to tackle these challenges. It leverages the chemistry type ontology structure to generate distant labels with novel methods of flexible KB-matching and ontology-guided multi-type disambiguation. It significantly improves the distant label generation for the subsequent sequence labeling model training. We also provide an expertlabeled, chemistry NER dataset with 62 finegrained chemistry types (e.g., chemical compounds and chemical reactions). Experimental results show that CHEMNER is highly effective, outperforming substantially the stateof-the-art NER methods (with .25 absolute F1 score improvement). * * The first two authors contributed equally to this work and should be considered as joint first authors.
| Year | Citations | |
|---|---|---|
Page 1
Page 1