Center for Data and Computing May 2019 Speaker Series
All talks will take place 12:00 – 1:00 PM at John Crerar Library, 5730 S. Ellis Ave., Room 390. Lunch will be served.
Join us this May for five illuminating talks from experts at the intersection of data science and domain research, hear about CDAC funding and research opportunities, and connect with the growing UChicago Data Science Community.
May 24: Tandy Warnow, Founder Professor of Computer Science, University of Illinois at Urbana-Champaign
Theoretical and Empirical Advances in Large-Scale Species Tree Estimation
The estimation of the “Tree of Life” — a phylogeny encompassing all life on earth–is one of the big Scientific Grand Challenges. Maximum likelihood (ML) is a standard approach for phylogeny estimation, but estimating ML trees for large heterogeneous datasets is challenging for two reasons: (1) ML tree estimation is NP-hard (and the best current heuristics can use hundreds of CPU years on relatively small datasets, just to find local optima), and (2) the statistical models used in ML tree estimation methods are much too simple, failing to acknowledge heterogeneity across genomes or across the Tree of Life. These two “big data” issues — dataset size and heterogeneity — impact the accuracy of phylogenetic methods and have consequences for downstream analyses.
In this talk, I will describe a new “divide-and-conquer” approach to phylogeny estimation that addresses both types of heterogeneity. Our protocol operates as follows: (1) we divide the set of species into disjoint subsets, (2) we construct trees on the subsets (using appropriate statistical methods), and (3) we combine the trees together using auxiliary information, such as a matrix of pairwise distances. I will present three such strategies (all published in the last year) that operate in this fashion, and that improve the theoretical and empirical performance of phylogeny estimation methods. One of the main applications of this work is species tree estimation from multi-locus data sets when gene trees can differ from the species tree due to incomplete lineage sorting. This talk is largely based on joint work with my PhD student, Erin Molloy (Illinois).
May 29: Sendhil Mullainathan, Roman Family University Professor of Computation and Behavioral Science, University of Chicago Booth School of Business
Computational Medicine: Data Science as a Tool for Discovery
Much has been made about how algorithms will automate parts of medicine, such as the reading of an X-ray. Such a vision is too short-sighted, failing to recognize a far more transformative role data science can have in medicine. When it comes to the human body, we have far more data than understanding. Rather than simply automating our existing limited knowledge, algorithms can serve to radically expand that knowledge. In this talk I will describe how medicine can be a high dimensional empirical science: empirical science because the basic goal is improved understanding; and high dimensional because we rely on data, such as imaging, which is by its nature an immensely rich input source. I will illustrate the discoveries to be had when taking such an approach, illustrating the currently overlooked signal in X-rays and ECGs. I will also describe the technical and conceptual challenges that arise in trying to use machine learning algorithms as a tool for scientific discovery.
Data Science Events Around UChicago
Computational Life Sciences Seminar Series
Personalized Medicine to the Bedside: Using Bioinformatics in Pharmacogenomics Translational Research
May 30, 2019
3:00 – 4:00 p.m.
John Crerar Library, Room 298
Center for Research Informatics
University of Chicago
Keith Danahey of the Center for Research Informatics and the Center for Personalized Therapeutics presents on the process of bringing translational research in genomics to patients at the University of Chicago, Northwestern University, and the University of Illinois at Chicago. Innovative measures include a custom laboratory information management system, Thermo Fisher TaqMan Open Array Assay validation, genomic translations of genotypes to star alleles to phenotypes, designing a physician-centered portal: The Genomic Prescribing System (GPS), delivery of clinical decision supports, Epic integration, multi-institution implementation, genomic data analysis tools, and data visualization techniques for publications. These methods have led to the advance of personalized medicine for patients throughout Chicago.
Environmental Data Science Lunch
This new lunch series is organized by the Center for Spatial Data Science and the Center for Robust Decision-Making on Climate and Energy Policy (RDCEP) / UChicago graduate traineeship program on environmental data science. Learn more >>.
Thursdays, 12:00 -1:30 PM
Searle Chemistry Lab, Room 240A
5735 South Ellis Ave
Chicago, IL 60637