In the current training environment and with so many great technological advancements, much discussion has been centered around the future of medical and surgical education. In the spirit of this discussion, we entertained the question of whether our trainees should be required to use extended reality (XR) technologies to achieve a certain level of proficiency before engaging with hands-on surgical experiences. These technologies include virtual reality (VR), augmented reality (AR), and mixed reality (MR).
Explore This Issue
April 2025Our con debate is headlined by Anaïs Rameau, MD, MPhil, MS, who is an associate professor, National Institute of Health (NIH)-funded laryngologist, and director of new technologies and chief of dysphagia in the department of otolaryngology at Weill Cornell Medical College in New York. She has clinical expertise in voice, cough, airway, and swallowing dysfunction, and scientific expertise in wearable device technology, AI application to voice and video data, and low-cost tech for low-resource settings.
Our pro debate is led by Eric Gantwerker, MD, MSc, MS, who is a pediatric otolaryngologist at Northwell Health/Cohen Children’s Medical Center and an associate professor of otolaryngology at Zucker School of Medicine at Hofstra/Northwell in New Hyde Park, N.Y. He holds a Master of Medical Science in medical education with a special focus on educational technology, educational research, and game-based learning.
Dr. Rameau: Against XR Mandatory Trainee Proficiency Before OR
Though the promise of improving trainees’ performance in the operating room with extended reality (XR) is exciting, I see several arguments against mandatory use prior to surgical participation.
By far the most convincing argument is that the validity and effectiveness of these simulation technologies have not been demonstrated at scale. Current research is primarily at the pilot stage, with few prospective and external validations. A recent scoping review (Int Orthop. doi: 10.1007/s00264-022-05663-z) of XR in surgery broadly found that most studies focused on XR and surgical training. Among these studies, measures of validity of XR programs varied, including construct validity (Does the program differentiate performance level based on trainee experience?), face validity (Is the program realistic?), content validity (Is the program representative of the surgical task it aims to measure?), and concurrent validity (How well does the program compare to existing training programs?). Some studies focused on surgical outcomes after XR training, including surgery duration, complications, degree of hand/head movement, and reported self-confidence.
With such a diversity of measures of validity and effectiveness, it is hard to draw firm conclusions on the value of XR in surgical training, and there is no basis for making such training tools mandatory. This problem is further compounded by evidence of the superiority of traditional dry lab simulations to XR-based training. Within otolaryngology–head and neck surgery (OHNS), researchers conducted a scoping review of XR applications and found that most studies remained at the proof-of-concept level to demonstrate feasibility, most relied on retrospective surgical data to create XR environments and none measured surgical outcomes prospectively, and most studies rely on subjective assessment of the XR value and not on quantitative measures of clinical benefit (J Clin Med. doi: 10.3390/jcm13216295).
Although it remains to be studied, the cognitive load of XR surgical training could actually impede learning among trainees. In other words, the current evidence for XR applications in OHNS does substantiate a mandate to use it in training, and it remains to be proven that XR is superior to existing training methods, including well-established ones in the dry lab and the cadaver lab.
Other barriers may be surmountable with advances in technology, but they remain real at this time. These include the cost and complexity of developing a realistic XR surgical training program. An augmented reality training pilot program typically costs between $25,000 and $50,000 or more, depending on the instructional design, procedure complexity, and custom features (Roundtable Learning. https://tinyurl.com/4v23xna8). XR software costs do not include the cost related to expert surgeons’ time. To create a high-quality scenario, the participation of expert surgeons in content development and feedback is mandatory. Getting feedback from multiple expert surgeons would be ideal to reach a consensus on content and represent diverse surgical scenarios and interventions. Expert surgeons’ time is not only scarce but also extremely valuable.
Robust validation of the XR training program, as mentioned earlier, is essential and bears its own cost, especially if done rigorously. One should, therefore, not equate the current low cost of VR/AR headsets with the actual cost of an XR software training program. Given the lack of robust data on return on investment, it is unclear if such large software expenses are justified. Although some argue that XR training may increase access to surgical training, the reality is that the current cost of VR/AR headsets may be prohibitive in low-resource settings.
Furthermore, given the high cost of developing XR-based surgical training software, creators of the software may be more motivated to monetize their work than to make it open access. VR/AR headsets are regularly updated, software developed for new hardware may not be compatible with older models, and it may be a high financial burden for residency programs to purchase new headsets every few years. The XR training program cost will likely improve with time as generative AI continues to advance with its already-proven capabilities in video creation. This will likely lead to more cost-effective, high-quality software development that could be performed by less highly skilled engineers.
Finally, there are at least two technical arguments against mandatory XR training in OHNS that relate to clinical informatics. The first one is that software compatibility is currently an existing issue with the various XR platforms. Interoperability issues may limit the ability to operate surgical training software depending on the brand of AR/VR headsets a residency program owns. Interoperability is a common issue in the medical industry and a complex one to resolve.
The second issue has to do with the privacy and security of patients’ and trainees’ data. Many XR platforms are developed by companies that have a vested interest in obtaining more real-world data on patients’ surgical cases and presentations and trainees’ performance to advance their own research and development programs. Reviewing XR companies’ data flows and practices is a complex and often lengthy process. Many healthcare institutions have developed committees and protocols to review tech companies’ data-sharing practices and conformity with HIPAA and institutional policy before issuing business associate agreement contracts. The liability of these contracts may be too burdensome for small XR startups to sign, which could hamper their success and ultimately hurt the rich startup ecosystem needed to support substantial innovation in XR surgical training.
Though XR’s potential in surgical training is exciting, there is a clear dissonance between the hype and the reality. As I argued, there is a lack of evidence for the validity of current prototypes, costs are prohibitive for widespread adoption of XR mandatory training, and finally, interoperability and data security challenges are common problems that are complex to resolve.
Dr. Gantwerker: For XR Mandatory Trainee Proficiency Before OR
In the current surgical training environment, we rely heavily on an established tradition of expert consensus of readiness for graduated responsibility as conceived under Dr. William Stewart Halsted (father of the model for current surgical training programs). One concern with this tradition is the relative subjectivity of the experts who are engaged in the assessment of our trainees and the heavy reliance on their individual experience, judgment, and ability to monitor trainee performance.
There is a wide degree of variability in experience and judgment when rating trainees, even when established frameworks are used, such as the Global Evaluative Assessment of Robotic Skills (GEARS). Dr. Ruben De Groote and his co-authors concluded in their study that “[The interrater reliability] levels of the GEARS assessment [are] poor, and this replicates previous findings on the use of Likert scales in general.” They go on to say, “Given the widespread usage of these types of assessments, this is a worrying observation as a validated test that is demonstrated to be unreliable is by default not valid” (Ann Surg Open. doi: 10.1097/AS9.0000000000000307).
In addition, very few surgeons are formally trained on how to assess and give feedback to trainees. The system has worked for many decades, but some may say that is despite this system as opposed to because of it. This reality was well crystallized and forced into the public eye after the publication of To Err is Human in 2000. The book created a tremendous pressure on training programs to keep patient safety paramount while also improving efficiency and throughput. All academic surgeons understand the constant battle between efficiency, safety, and creating a conducive training environment for residents and fellows.
The role of simulation has expanded over the last few decades to help provide an environment for trainees to have deliberate practice without direct patient risk. Simulation has become a mandatory component of training and assessment of procedural skills in pilots, astronauts, and several other industries, but somehow has not yet become a mandatory component of medical or surgical training or assessment. As technology has advanced with XR and AI, we have an unprecedented opportunity to turn traditional Halstedian training on its head. XR technologies, including VR, AR, and MR, have been in existence since the 1960s but have recently gained mainstream momentum as computing power has rapidly advanced and consumer-level products have come
to market.
XR has become a massive industry, reaching $184 billion in 2024, and it is projected to reach $1.7 trillion in 2032 (Fortune Business Insights. https://tinyurl.com/4r3fkj93). Healthcare-focused XR companies seem to pop up by the dozens every day, and surgical applications of XR have expanded to anatomical viewing, surgical preparation and planning, and even intra-operative XR-assisted surgery.
AI has existed since the 1950s, when Alan Turing published Computing Machinery and Intelligence, and it has exploded in the last decade. Healthcare applications are seemingly limitless. A subset of AI called computer vision has enabled several use cases in surgery, including real-time coaching and mentoring during surgery and objective assessment of skill. Paired with XR, this technology can create a real-time roadmap during surgery with anatomical overlays and real-time surgical guidance.
The natural progression of these technological advancements is to leverage them to create surgical trainees who are as proficient as they can be before direct patient care. In otolaryngology, we often do this in otology with the temporal bone lab, but this has not permeated into any other surgical procedures. During my training, I was required to do 50 stapedectomies with stapes implant placement the night before participating in a stapes surgery with my chair. He always stated that he would know if we did not do the practice the night before, and I have no doubt he could.
However, with XR and AI, we not only have the opportunity to conduct 50 of these procedures, but we can also determine the actual training curves for every individual surgical trainee and realize that some might be ready after 10 sessions, while some may take 100. This would be seamlessly determined by the actual simulation training, all of which occurs on the trainees’ own time without taking up faculty time. Through these AI tutors, we have the potential to “support student learning at a scale previously unimaginable” (Pearson Education. https://tinyurl.com/5548pph8).
Obviously, it is unrealistic to expect every surgical procedure to currently be modeled well enough in XR to provide this type of experience, but that does not mean we shouldn’t strive to create as many as we can. We already have temporal bone XR simulations that provide a great deal of this individualized training, even in patient-specific temporal bone cases. Imagine if the resident can train on the actual patient case in XR in the days to weeks leading up to the case and then, once in the operating room theater, they have done that exact case to objective proficiency. Imagine the effect on patient safety, efficiency, and patient outcomes. I am hopeful we can make that dream a reality.
Conclusions
As discussed here, the answer is not straightforward. Technology offers much hope and hype. When will technology meet the education needs and level of validity needed to add value to our surgical education? When will it reach the level of fidelity needed to truly create a realistic, safer training environment? Please let us know your thoughts and comments.
Dr. Gantwerker is a pediatric otolaryngologist at Cohen Children’s Hospital at Northwell Health in New Hyde Park, N.Y., who has also treated adults and closely follows tech advances in otolaryngology.
Dr. Rameau is an assistant professor and the director of new technologies in the department of otolaryngology–head and neck surgery at Weill Cornell Medical College in New York.
Leave a Reply