Textual risk mining for maritime situational awareness

Abstract

In this paper, we propose an auxiliary Machine Learning (ML) and Natural Language Processing (NLP) integrated system for maritime situational awareness (MSA) operations. We bring into account a new and influential asset - human intuition and perception - to the existing semi-automated decision support systems that mostly rely on numerical data collected by electronic sensors or cameras located either directly on the vessels or in the maritime command-and-control centers. For our project, we gathered weekly textual reports spanning twelve months from the United States Worldwide Threats to Shipping Reports repository that belongs to the National Geospatial-Intelligence Agency (NGA), We considered the maritime incident reports written by human operators as a valuable and accessible unstructured textual input source in which a span of text <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> is called “risk” if it expresses one of the following kinds of vessel incidents: fired, robbed, boarded, hijacked, attacked, chased, approached, kidnapped, boarding attempted, suspiciously approached or clashed with. Our approach benefits from probability distributions of some useful features annotated based on a list of lexicons that contain expressions denoting vessel types, risks types, risk associates, maritime geographical locations, dates and times. These distributions are captured and used to anchor the span of “risks” as they are described in the textual reports. After some preprocessing steps that include tokenization, named entity extraction and part-of-speech tagging, the textual risk mining system applies a variety of sequence classification algorithms, e.g., Conditional Random Fields, Conditional Markov Models and Hidden Markov Models in order to compare the risk classification performance. Empirical results show that our NLP/ML-based system can extract variable-length risk spans from the textual reports with about 90% correctness.

References

Page 1

	Year	Citations

Page 1