Alan Karr, Ph.D.Research Triangle International Center of Excellence for Complex Data Analysis

Title: Effects of Method Parameters and Ground Truth in the OMOP Results Database

Abstract: The OMOP results database is a unique dataset containing the results of more than 6 million statistical analyses of patient-level medical data. The analyses are meant to identify drugs associated with particular adverse outcomes. The data are drawn from five large provider databases, and cover 181 drugs and 19 outcomes. Seven classes of statistical methods were applied to each (drug, outcome) pair, with 1246 variants arising from parameter settings within each method class. The dataset also contains a binary ground truth for each (drug, outcome) pair. This presentation will describe two sets of condition-specific analyses of the OMOP results database that illuminate the roles of modeling choices and ground truth in determining log relative risk values. The first are partition models that show concretely the effects and interactions of model parameters and ground truth, as well as reveal clearly the bias resulting from particular choices of parameters. The second sets of analyses are ”efficient frontier” models that identify non-dominated methods on the basis of misclassification rates. An simplified encapsulation of the results is that often how the data are analyzed affects the results more than does ground truth.