Concepedia

Abstract

For modern mortgage firms, the process of setting up and verifying a new loan, known as origination, is complex and multifaceted. The literature notes that this process is rife with delays that can stunt the firm's business opportunities, but no modern analytical techniques have been developed to address the problem. In this paper, we suggest the use of text analytic and machine learning techniques to predict likely delays. In collaboration with a large national mortgage firm, we derive a large dataset of transcripts from employees' communications pertaining to potential loans. We first use information retrieval to generate an initial list of “seed terms,” or terms most associated with loans that were delayed. We then use an array of machine learning approaches to generate predictive models based upon these seed terms. We find that these approaches are comparable in performance to less interpretable state-of-the-art approaches utilizing word embeddings. The resultant models offer interpretable and high-performing solutions to mitigate the risk of delays through early risk detection.

References

YearCitations

Page 1