Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing

Abstract

Contract documents are a critical legal component of a construction project that specify all wishes and expectations of the owner toward the design, construction, and handover of a project. Precise comprehension of the contract documents is critical to ensure that all important contractual requirements of the project scope are captured and managed. A contract package typically includes both requirements and other unimportant texts such as instructions and supporting statements; thus, practitioners are required to read and identify texts indicating the requirements. The conventional manual practice of scope comprehension requires much time and effort and may include human errors. Little attention has been paid toward automated identification of requirement texts. This study introduces an effective way to identify contractual requirements by developing an automated framework using natural language processing (NLP) and machine learning techniques. Four different machine learning algorithms, namely Naïve Bayes, support vector machines, logistic regression, and feedforward neural network were used to develop the classification models. The models classified the contractual text into requirement and nonrequirement text. Experiments showed that the support vector machine model outperforms the other models in terms of accuracy, precision, recall, and F1-score. In addition, unigrams yield better results than higher n-gram features. An experimental study including human participants further proves that the developed model is efficient and effective that can help reduce reading time and improve contract scope comprehension.

References

Page 1

	Year	Citations

Page 1