Publication | Open Access
Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields
93
Citations
13
References
2020
Year
EngineeringInformation SecurityCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsEntity RecognitionBidirectional Lstm LayerLanguage StudiesNamed-entity RecognitionSequence ModellingEntity DisambiguationNlp TaskKnowledge DiscoveryNetwork TextsComputer ScienceInformation ExtractionConditional Random Fields
Network texts have become important carriers of cybersecurity information on the Internet. These texts include the latest security events such as vulnerability exploitations, attack discoveries, advanced persistent threats, and so on. Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications. However, most Named Entity Recognition (NER) models are suitable only for general fields, and there has been little research focusing on cybersecurity entity extraction in the security domain. To this end, in this paper, we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields (Bi-LSTM with CRF) to extract security-related concepts and entities from unstructured text. This model, which we have named XBiLSTM-CRF, consists of a word-embedding layer, a bidirectional LSTM layer, and a CRF layer, and concatenates X input with bidirectional LSTM output. Via extensive experiments on an open-source dataset containing an office security bulletin, security blogs, and the Common Vulnerabilities and Exposures list, we demonstrate that XBiLSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1