There are increasing opportunities to use enormous data sets in research, but getting started can be a daunting task, and pitfalls abound, said panelists at the 2021 American Academy of Otolaryngology–Head and Neck Surgery Annual Meeting. Experts discussed the main features, benefits, and drawbacks of the largest databases—from children’s to insurance, to national representative to the academy’s own registry—while emphasizing that even though using them correctly isn’t a simple matter, it is a vital one.
Explore This Issue
November 2021“There are data that are relevant to our healthcare and health outcomes all around us and in many different sources,” said Derek Lam, MD, MPH, moderator of the panel and associate professor of otolaryngology–head and neck surgery at Oregon Health and Science University in Portland. “Once the data are there, then they have to be accessed, cleaned, and processed. The descriptive analytics involved are often very complicated.”
Differences in Databases
These large databases have strong statistical power and generalizability, and usually precision in their data collection, but care is needed in interpreting the data, and there can be problems with accuracy in administrative data collection, Dr. Lam added.
The panelists reviewed several databases:
- IBM MarketScan Commercial Claims and Encounters Database: This includes insurance claim data for about 50 million privately insured Americans each year. The data include inpatient, outpatient, emergency department, and pharmaceutical claims in a database that was created in 1996 and set up so that individuals’ data can be linked across insurers. But it’s expensive—five years of data costs about $25,000, and an experienced programmer is needed to analyze the data in what can be a time-consuming process, Dr. Lam said.
- American College of Surgeons’ National Surgical Quality Improvement Program (NSQIP): This database uses a nationally validated outcomes-based approach to measure the quality of surgical care. Its institutional dat , which includes clinical data from the medical record and is not administrative, and the data is collected by a trained, certified data collector. Patients can be followed over time and outcomes that are procedure specific and long term can be assessed. Its semi-annual report of data allows benchmarks from an institution to be compared to 102 other institutions in a risk-adjusted way. One example of a study using NSQIP data is a look at safety and postoperative adverse events in pediatric otologic surgery, Dr. Lam said.
- Kids Inpatient Database: As its name suggests, this is a database of pediatric inpatient care that is publicly available and includes 2 to 3 million discharges a year, said Nikhila P. Raol, MD, MPH, assistant professor of otolaryngology– head and neck surgery at Emory University School of Medicine in Atlanta. Because it is based on hospital encounters, the database can’t be used to track patients over time. To get a sample that is nationally representative, the results must be weighted, Dr. Raol said. A convenient feature is that users can query the database with a research question and find out whether the database can answer
that question. “You say, ‘Do I have the number of patients that I need to answer this question?’ or ‘Do these complications occur frequently enough? Is this condition captured enough?’” she said. Dr. Raol has used it to examine whether there was a difference in cost in tonsillectomy depending on where the surgery was done. - Pediatric Health Information System: This includes inpatient, ambulatory surgery, observation, and emergency encounters from 52 children’s hospitals. The data from 1999 are longitudinal, meaning that patients’ variables were repeatedly observed over periods of time, so outcomes and utilization can be looked at over time and patients can be tracked across their encounters. This database, Dr. Raol said, would be appropriate for research questions how often a disease occurs in a population, how frequently a procedure is performed in a given population, or how often a certain comorbidity is present among hospitalized children with a specific diagnosis. She cautioned that any care that occurs outside the children’s hospital system won’t be captured, so some episodes of care will be missed for individual patients, even though this is a longitudinal database.
- National Hospital Ambulatory Medical Care Surveys: Data in this set are collected in annual installments and are captured as provider– patient visits. The database includes office-based and hospital- based visits, but not administrative data, electronic data, radiology information, anesthesia, or other kinds of data.
Advantages and Pitfalls
“What’s particularly compelling about these national databases is that they’re designed so that the results can be representative of how the whole United States utilizes and provides ambulatory care,” said Jennifer J. Shin, MD, SM, associate professor of otolaryngology– head and neck surgery at Harvard Medical School in Boston.