The Missing Step: Statistical Inference from Big Data

Nicholas Jewell (September 19, 2012)

Please install the Flash Plugin

Abstract

The 20th century revolution in statistics focused on measurement, experimental design, modeling and computational issues in a world of "small" data where the number of observations and/or variables were typically limited and information available in single sources. Scientists face very different challenges in the current age where data is often streamed in real time, and the number of inputs, outputs or confounders are often massive. This presents challenges for reliable inference about "old" questions, while providing opportunities to investigate much more subtle issues about mechanisms of action, while reducing our reliance on unnecessary assumptions. We describe briefly some recent advances in data measurement, cleaning, and analysis that reflect these ideas, focusing finally on two applications (i) determining gene expression signatures of benzene exposure, and (ii) examining the influence of bisphenol A (BPA) in utero on patterns of weight gain in children.