Publication | Open Access
Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments
29
Citations
19
References
2023
Year
Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care.
| Year | Citations | |
|---|---|---|
Page 1
Page 1