Metagenomics and metrics on spaces of probability measures

Steven Evans (September 15, 2011)

Please install the Flash Plugin

Abstract

Metagenomics attempts to sample and study all the genetic material present in a community of micro-organisms in environments that range from the human gut to the open ocean. This enterprise is made possible by high-throughput pyrosequencing technologies that produce a "soup" of DNA fragments which are not a priori associated with particular organisms or with particular locations on the genome. Statistical methods can be used to assign these fragments to locations on a reference phylogenetic tree using pre-existing information about the genomes of previously identified species. Each metagenomic sample thus results in a cloud of points on the reference tree. In seeking to answer questions such as what distinguishes the vaginal microbiomes of women with bacterial vaginosis from those of woment who don't, one is led to consider statistical methods for distinguishing between such clouds. I will discuss joint work with Erick Matsen from the Fred Hutchinson Cancer Research Center in which we use ideas based on distances between probability measures that go back to Gaspard Monge's 1781 "M'emoir sur la th'eorie des d'eblais et des remblais" as well as some familiar objects (e.g. reproducing kernel Hilbert spaces) from the world of Gaussian processes.