Automated Domain Modeling with Large Language Models: A Comparative Study

Abstract

Domain modeling is an essential part of software engineering and serves as a way to represent and understand the concepts and relationships in a problem domain. Typically, software engineers interpret the problem description written in natural language and manually translate it into a domain model. Domain modeling can be time-consuming and highly depends on the expertise of software engineers. Recently, Large Language Models (LLMs) have exhibited remarkable ability in language understanding, generation, and reasoning. In this paper, we conduct a comprehensive, comparative study of using LLMs for fully automated domain modeling. We assess two powerful LLMs, GPT3.5 and GPT4, employing various prompt engineering techniques on a data set containing ten diverse domain modeling examples with reference solutions created by modeling experts. Our findings reveal that while LLMs demonstrate impressive domain understanding capabilities, they are still impractical for full automation, with the top-performing LLM achieving F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> scores of 0.76 for class generation, 0.61 for attribute generation, and 0.34 for relationship generation. Moreover, the F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> score is characterized by higher precision and lower recall; thus, domain elements retrieved by LLMs are often reliable, but there are many missing elements. Furthermore, modeling best practices are rarely followed in auto-generated domain models. Our data set and evaluation provide a valuable baseline for future research in automated LLM-based domain modeling.

References

Page 1

	Year	Citations

Page 1