Concepedia

TLDR

Large language models such as GPT‑4 are transforming software’s language understanding, prompting the data‑management community to reflect on this disruption as it has done for the web, cloud computing, and statistical machine learning. The authors contend that LLMs will disrupt data management from two angles—by enabling new solutions to hard database problems and by blending predictive modeling with information retrieval—and illustrate these effects with concrete examples. They find that LLMs can overcome automation ceilings in entity resolution, schema matching, data discovery, and query synthesis by grounding database elements in real‑world concepts, and that they blur the boundary between predictive models and information‑retrieval systems through their question‑answering capabilities.

Abstract

Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to understand, process, and synthesize language. The authors of this paper believe that this advance in technology is significant enough to prompt introspection in the data management community, similar to previous technological disruptions such as the advents of the world wide web, cloud computing, and statistical machine learning. We argue that the disruptive influence that LLMs will have on data management will come from two angles. (1) A number of hard database problems, namely, entity resolution, schema matching, data discovery, and query synthesis, hit a ceiling of automation because the system does not fully understand the semantics of the underlying data. Based on large training corpora of natural language, structured data, and code, LLMs have an unprecedented ability to ground database tuples, schemas, and queries in real-world concepts. We will provide examples of how LLMs may completely change our approaches to these problems. (2) LLMs blur the line between predictive models and information retrieval systems with their ability to answer questions. We will present examples showing how large databases and information retrieval systems have complementary functionality.

References

YearCitations

Page 1