The AAO-HNS Reg-ent database continues to grow, with more than 19 million office visits recorded in 2020, up from about 3 million in 2016. In addition, there is a good spread in the age of the patients for which data are collected, Dr. Shin said. “Even in the age groups that are the least populated, there’s still a pretty good amount of patients, and this is really helpful,” she said. “In Reg-ent, the population is nicely filled out. It looks to be a really good resource from that respect.”
Explore This Issue
November 2021David O. Francis, MD, associate professor of surgery at the University of Wisconsin, Madison, cautioned that pitfalls abound when using big data. Hundreds of paper submissions are never even sent out for peer review because of flaws in their technique or because they try to answer a research question that can’t be addressed with the dataset in use.
What’s particularly compelling about these national databases is that they’re designed so that the results can be representative of how the whole United States utilizes and provides ambulatory care. —Jennifer J. Shin, MD, SM
“Most of us, when we consider these database projects, especially when we read them, just think, ‘Hey these people put a bunch of data into the computer and it spit out facts,’” he said. “But to do it properly is much more complicated because the data are imperfect. All data collected are imperfect. Data are collected for different purposes.” There’s a risk of spreading false information if data use isn’t properly considered.
When setting out on a big data research project, said Dr. Francis, researchers should:
- Make sure their project is hypothesis driven, rather than making use of data mining in search of relationships. With such large amounts of data, statistically significant associations are plentiful, but that doesn’t mean they aren’t spurious or that the associations are relevant.
- Seek institutional review board approval and comply with data use agreements.
- Do the “homework” of understanding the peculiarities of the database and make sure to use appropriate variables and methodologies— administrative data and clinical data are not the same, for instance.
- Clearly define inclusion and exclusion criteria.
- Identify potential confounders and use risk adjustment to minimize bias.
- Account for updates and changes to variables over time.
- Identify and address competing risks.
- Determine how to handle missing data.
- Have a clear take-home message.
“This should all be thought about before you even start the study,” he said. “What are you trying to say, what are you trying to study? As you move on, you need to think about what your message is and how your research advances current knowledge.”