In March 2023, the latest version of ChatGPT debuted. Arguably the most recognized large language model (LLM) of the world, ChatGPT -4 continues the steady improvements made on earlier versions, with its enhanced capabilities to train on more inputs and newer data, enabling it to respond to increasingly broader queries with more accuracy.
Explore This Issue
April 2024Months later, OpenAI, the company that owns ChatGPT, announced the availability of ChatGPT-4 Turbo, which, true to its name, is even faster, supports longer inputs, and is trained on more recent data (up to April 2023, compared to September 2021 in previous models). Multimodal applications integrating ChatGPT with the image-generating LLM Dall-E and, most recently, the video-generating Sora, expand the creative uses of this technology. Other LLMs, notably Microsoft’s Copilot and Google’s Gemini (previously named Bard), are also making headlines, further cementing the transformative potential of this technology, in much the same way that internet search engines did in the 1990s.
In the time between the writing and printing of this article, new versions or updates to these models and updated use applications will be in the news. Trying to keep up with the rapid evolution of this technology is challenging, only adding to the daunting task of implementation—particularly in a field like medicine, which relies so heavily on informed judgment and clinical expertise to fulfill its mission to first do no harm.
As with any powerful new technology, excitement over the real and potential benefits of LLMs within healthcare will need to be continually evaluated against real and potential risks. With the launch of ChatGPT for general usage, the time has arrived to weigh in on this balancing act as more people adopt the technology.
We definitely have crossed the threshold into a new era of AI, and we probably have already crossed the threshold of ubiquity of using AI on a day-to-day basis for a lot of individuals. — Alfred-Marc Iloreta, Jr., MD
“We definitely have crossed the threshold into a new era of AI, and we probably have already crossed the threshold of ubiquity of using AI on a day-to-day basis for a lot of individuals,” said Alfred-Marc Iloreta, Jr., MD, assistant professor of artificial intelligence and emerging technologies in the Graduate School of Biomedical Sciences at the Icahn School of Medicine at Mount Sinai Hospital in New York City, where he is also an assistant professor of otolaryngology–head and neck surgery and neurosurgery and co-directs the Endoscopic Skull Program.
Although uptake of AI in healthcare isn’t yet ubiquitous, as suggested by a December 2023 AMA survey of over 1,000 physicians in which only 38% of respondents said they currently use AI (AMA Augmented Intelligence Research. Nov. 2023), the easy access and no to low cost of software like ChatGPT assures its quick growth. Early users report its real-time benefits for everyday workflow activities such as administrative tasks and documentation (generating discharge notes, care plans, progress notes, clinical notes, and preauthorization letters), as was discussed in the in the March 2024 ENTtoday Tech Talk article. Other areas under investigation are conducting education and research and generating patient materials. For higher order clinical activities, such as diagnosis, triage, and treatment decisions, research is ongoing to understand the safe, beneficial uses and limitations of LLMs.
Below is a brief sampling of the research that’s underway on implementing LLMs, and ChatGPT in particular, in otolaryngology education and patient communications.
ChatGPT for Otolaryngology Education
Habib Zalzal, MD, assistant professor of otolaryngology and pediatrics at Children’s National Medical Center, The George Washington University, Washington, D.C., sees adoption happening at a rapid rate among medical students, residents, and attendings who in recent months have incorporated LLMs into their daily life for educational purposes like studying or journal club summaries. “With this mass adoption, it’s only a matter of time before ChatGPT or other browser-based LLMs become a daily habit in our work day,” he said.
He cautioned, however, that ChatGPT and other LLMs don’t and cannot replace traditional learning sources such as textbooks and journal articles but should be seen as a supplement to these. In particular, he’s concerned about an overreliance on ChatGPT for educating students who as yet don’t have the prerequisite medical knowledge base on which to build critical thinking and judgment. “Reliance on early ChatGPT versions, much like a bad habit, is harder to break if the proper knowledge base isn’t there,” he said.
Part of Dr. Zalzal’s caution comes from data showing the limitations of ChatGPT for educational purposes. Following reports showing the ability of ChatGPT to exceed the passing score of the medical licensing exam (PLOS Digit. Health. 2023. doi.org/10.1371/journal.pdig.0000198; Sci Rep. 2023. doi.org/10.1038/s41598-023-43436-9), he and his colleagues undertook a study to quantify how well ChatGPT 3.5 concurred with expert otolaryngologists when asked high-level questions requiring both rote memorization and critical thinking (OTO Open. 2023. doi:10.1002/oto2.94). The tool performed better on open-ended questions (56.7% accuracy) than on multiple-choice questions (43.3% accuracy), but, overall, wasn’t sufficient as a stand-alone educational tool. Its lower accuracy in answering multiple-choice questions was attributed to ChatGPT’s default in providing some form of answer even if it didn’t know the answer, which can easily generate a false or made-up response, called a hallucination.
“LLMs can sometimes generate plausible yet incorrect answers that may mislead or harm users,” he said. “Even if the training data were created using validated sources, the risk of hallucination by the AI model could lead to the spread of misinformation or misuse that cannot be easily controlled.”
Improved accuracy was reported in more recent studies using an LLM trained in a comprehensive knowledge database of otolaryngology-specific information integrated into ChatGPT-4 (JMIR Med Educ. 2024. doi:10.2196/49970. Lancet. 2023. doi:10.2139/ssrn.4571725 (preprint)). Called ChatENT, the model was developed by researchers at the University of Alberta and Copula AI and, according to the authors, is the first specialty-specific LLM in the medical field.
When challenged with practice questions for board certifying exams in Canada and the United States, ChatENT scored 87.2% for accuracy on open-ended, short answer questions and 80% for multiple-choice questions, outperforming ChatGPT-4 with fewer hallucinations and errors identified.
Lead author of the study, Cai Long, MD, an otolaryngology surgical resident at the University of Alberta, said the model, still in its early beta stage, is being continuously updated and improved and that its current state may not fully represent its most refined or comprehensive version. “Future iterations and research are expected to address these aspects, further enhancing the model’s robustness and applicability,” he said. Users who want to test the model can access it at https://www.chatent.net.
Long-cited potential applications of ChatENT include medical education, patient education, and clinical decision support, which, said Dr. Long, has yet to be studied for efficacy.
Eric Gantwerker, MD, MSc, MS, a pediatric otolaryngologist and associate professor at Northwell Health in New York City who regularly teaches students and faculty how to use ChatGPT, views apparent limitations such as hallucinations as part of educating students on the strengths and weaknesses of the technology. He also shows them how they can leverage it to test their own knowledge by judging the validity of outputs from the platform.
People don’t realize that with subsequent updated models, limitations like hallucinations are going to go away. — Eric Gantwerker, MD, MS