LI-EMRSQL: Linking Information Enhanced Text2SQL Parsing on Complex Electronic Medical Records

TLDR

Converting natural language into executable SQL queries is crucial for healthcare, especially for electronic medical records, but designing a versatile parser that can handle new databases remains a major challenge. The authors propose LI‑EMRSQL, a novel framework to develop a Text‑to‑SQL parser that correlates intricate medical terminology in electronic medical records. LI‑EMRSQL improves schema linking by using an unsupervised Poincaré distance metric detection procedure that leverages induced relations to enhance graph‑based parsers and accurately identify unseen columns or tables. LI‑EMRSQL achieves state‑of‑the‑art performance on two conventional and two EMR Text‑to‑SQL datasets, with notable improvements in schema comprehension and alignment.

Abstract

Converting natural language text into executable SQL queries significantly impacts the healthcare domain, specifically when applied to electronic medical records. Given that electronic medical records store extensive patient information in a relational multitable database, developing a Text-to-SQL parser would enable the correlation of intricate medical terminology through semantic parsing. A major challenge is designing a versatile Text2SQL parser applicable to new databases. A critical step towards this goal involves schema linking - accurately identifying references to previously unseen columns or tables during SQL creation. In response to these key challenges, we propose a novel framework—Linking Information Enhanced Text2SQL Parsing on Complex Electronic Medical Records (LI-EMRSQL). This model leverages the Poincaré distance metric detection procedure, utilizing induced relations to enhance the performance of pre-existing graph-based parsers and improve schema linkage. To enhance the generalizability of LI-EMRSQL, the detection process is completely unsupervised and does not necessitate additional parameters. On two conventional Text2SQL datasets and two EMRs Text2SQL datasets, the system delivers SOTA performance. Furthermore, notable enhancements in the model's comprehension and alignment of schemas are observed.

References

Page 1

	Year	Citations

Page 1