Bidirectional LSTM-CRF Models for Sequence Tagging

TLDR

The paper proposes a range of LSTM‑based models for sequence tagging. The authors implement several architectures—LSTM, BI‑LSTM, LSTM‑CRF, and BI‑LSTM‑CRF—combining a bidirectional LSTM with a CRF layer to capture both past/future context and sentence‑level tag dependencies. The BI‑LSTM‑CRF achieves near state‑of‑the‑art accuracy on POS, chunking, and NER benchmarks, demonstrating robustness and reduced reliance on word embeddings.

Abstract

In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. We show that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag information thanks to a CRF layer. The BI-LSTM-CRF model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, it is robust and has less dependence on word embedding as compared to previous observations.

References

Page 1

	Year	Citations
Long Short-Term Memory Sepp Hochreiter, Jürgen Schmidhuber Neural Computation	1997	93.8K
A tutorial on hidden Markov models and selected applications in speech recognition L. R. Rabiner Proceedings of the IEEE EngineeringMachine LearningHidden StatesDiscrete Markov ChainsSpeech Recognition	1989	22.6K
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando C. N. Pereira ScholarlyCommons (University of Pennsylvania)	2001	13K
Finding Structure in Time Jeffrey L. Elman Cognitive Science Semantic ProcessingPsycholinguisticsMemory DemandsRecurrent Neural NetworkLanguage Processing	1990	10.6K
Speech recognition with deep recurrent neural networks Alex Graves, Abdelrahman Mohamed, Geoffrey E. Hinton Natural Language ProcessingDeep Neural NetworksRnn PerformanceMachine LearningEngineering	2013	8.7K
Recurrent neural network based language model Tomáš Mikolov, Martin Karafiát, Lukáš Burget, EngineeringSpoken Language ProcessingRecurrent Neural NetworkSpeech RecognitionNatural Language Processing	2010	5.4K
Incorporating non-local information into information extraction systems by Gibbs sampling Jenny Rose Finkel, Trond Grenager, Christopher D. Manning Syntactic ParsingEngineeringKnowledge ExtractionSemantic WebCorpus Linguistics	2005	3K
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data Rie Kubota Ando, Tong Zhang Journal of Machine Learning Research Artificial IntelligenceMultiple TasksSemi-supervised Learning SettingMultiple Instance LearningEngineering	2005	1.4K
Maximum Entropy Markov Models for Information Extraction and Segmentation Andrew McCallum, Dayne Freitag, Fernando C. N. Pereira	2000	1.3K
A Maximum Entropy Model for Part-Of-Speech Tagging Adwait Ratnaparkhi	1996	1.3K

Page 1