The Center for Data and Computing proudly announces the Autumn 2019 class of Discovery Grants, funding eight projects and one convening that bridge disciplines and push forward the frontiers of data science and artificial intelligence. Discovery Grants provide risk-tolerant seed funding for innovative data science projects intended to achieve a clear impact on major scientific, scholarly, and societal questions.
This round of projects spans from children’s books to the Large Hadron Collider, from ancient cuneiform tablets to wearable devices that detect disease in sweat. The program will fund researchers in medicine, economics, archeology, molecular engineering, computer science, and several other disciplines, working together to tackle complex problems that spill beyond traditional academic boundaries. Though the applications are diverse, the projects overlap in their deployment of cutting-edge data and technological approaches, such as computer vision, machine learning, remote sensing, explainable AI, and predictive analytics.
Learn more about each grant recipient below. You can also read about our inaugural cohort of Discovery Grant recipients, awarded in early 2019. Our next call for Discovery Grant proposals will be announced in mid-2020.
[Headline image from Dustin Kleckner and William Irvine’s study of vortex loops; read more here.]
Measuring Messages about Race and Gender: Evidence from Children’s Experience
Early influences that depress children’s beliefs about their own ability can lead to lower educational achievement and persistent disadvantage. In particular, receiving negative messages about gender- and race-specific levels of ability have played a role in generating disadvantage for women and minorities. Children are particularly vulnerable to negative messages about race and gender, as their beliefs about their own capacities are highly malleable.
This project aims to advance understanding of these processes by improving how we estimate the extent and implications of children’s exposure to race- and gender-coded messages. It will develop, verify, and apply new methods of human-directed, machine-implemented content analysis to improve the measurement of these messages, with the goal of better understanding their extent and their contribution to inequality across race and gender.
Learning How To Measure Scientific Images
To reveal and understand physical laws of nature, scientists sometimes create novel experimental systems in their laboratories, which they then capture with video cameras. To fully realize what these systems are showing, however, scientists must also often create new, bespoke computational data analysis pipelines to extract key quantitative parameters from their video image data. This project seeks to discover underlying commonality between different forms of experimental scientific data analyses, and then exploit these shared properties with a combination of machine learning methods (data augmentation and active learning). Ultimately, it seeks to create a visual, interactive data analysis system, based on a neural network for image processing and visualization for display of extracted features, that will lower the human cost of creating new ways to study nature.
Automated Prostate Cancer Detection using Hybrid Multi-dimensional MRI and Deep Learning
There is a critical need for new methods for the screening and diagnosis of prostate cancer. Using conventional MRI, around 15 to 30 percent of clinically-significant cancers are missed, even by expert radiologists. The application of computer-aided detection and artificial intelligence tools to multi-parametric MRI shows promise in aiding radiologists in prostate cancer diagnosis, but low specificity and high false positive rates remain a concern. This project will assess whether the combination of deep learning methods and data from hybrid multi-dimensional MRI (HM-MRI) — a non-invasive technique developed by UChicago radiologists that provides tissue composition measures similar to the gold standard of pathology — can improve the diagnostic accuracy of detecting prostate cancer.
Towards a Data-driven Trigger System for the Large Hadron Collider
Data-intensive discovery science is increasingly reliant on real-time processing capabilities and machine learning workflows in order to filter and analyze the extreme volumes of data being collected. For example, the search for dark matter at the Large Hadron Collider (LHC) can produce more than 100 terabytes of heterogeneous, high-dimensional data each second. To filter this data, physicists use “trigger algorithms,” the creation of which is resource-intensive and prone to significant blindspots.
This project will seek to investigate the possibility of automating the trigger system of the LHC, applying recent advances in artificial intelligence, including explainable AI, active learning and imitation learning, to the energy and intensity frontiers of particle physics. The goal is to build a principled, white-box trigger system that could be succinctly explained using existing physics insights, while still having an explicit representation of uncertainty that allows for efficient exploration of the vast space of possible event data. The work will lead to new machine learning paradigms as well as novel insights in discovery science.
Leveraging Machine Learning and Satellite Imaging to Reduce Oil and Gas Methane Emissions
The meteoric rise of shale oil and gas drilling in the United States poses significant challenges for reducing greenhouse gas emissions. The methane emitted from these operations has 34 times greater short-term global warming potential than CO2, thus contributing aggressively to climate change. Existing methods of measuring methane emissions are imprecise and expensive, severely limiting regulators’ ability to monitor and enforce regulations. Without reliable estimates of methane emissions, regulators are unable to efficiently target leaks, incentivize their prevention, and realize climate benefits.
This project will develop and test a data-driven approach to methane emissions monitoring and regulation, in close partnership with Colorado regulators. It will leverage large-scale administrative data such as permitting records and historical inspections to build a supervised machine learning model that predicts methane leaks at facilities. These predictions can then be used to target the collection of high-resolution emissions measurements at the facility level using state-of-the-art satellites. This breakthrough combination of machine learning and new measurement technology would provide regulators with a potentially highly cost-effective and scalable inspection targeting framework. Finally, a randomized controlled trial will rigorously estimate the impact of applying the novel technology for regulatory enforcement, and estimate the benefits of improved emissions monitoring.
Computer-Assisted Diagnosis of Indeterminate Thyroid Lesions
Thyroid nodules are very common and 5 to 15 percent of thyroid nodules are cancerous. In order to determine whether a thyroid nodule is benign or malignant, physicians perform a biopsy of the nodule. However, in up to 40 percent of cases, this biopsy is indeterminate, meaning the pathologist cannot differentiate between benign and malignant cells. Patients are then referred for surgical excision of these nodules in order to obtain the correct pathological diagnosis. But 70 percent of patients undergoing thyroid surgery for diagnostic purposes have a benign nodule on final diagnosis, meaning surgery was unnecessary.
This project will develop machine learning software to define imaging features of benign, malignant, and indeterminate thyroid lesions on ultrasound and relate these findings with current molecular testing. This new computer-assisted diagnosis approach aims to accurately differentiate whether an indeterminate thyroid nodule is benign or malignant, thus potentially avoiding unnecessary surgery in thousands of patients every year in the US. It will foster collaborations between diagnostic and therapeutic thyroid specialists and machine learning experts to study one of the important diagnostic challenges in modern medicine.
DeepScribe: Deciphering Cuneiform with Artificial Intelligence
Manually deciphering a cuneiform tablet is a laborious, time-consuming, and error-prone process. This project explores how recent advances in computer vision can assist researchers by automatically identifying symbols and words in images of cuneiform tablets. It will leverage the extensively annotated collections of the Online Cultural and Historical Research Environment (OCHRE) as training data for machine learning vision models that preliminary results suggest are accurate up to 83 percent of the time. This discovery grant will facilitate an important step towards the goal of meaningful automatic transcription and indexing of the extensive worldwide cuneiform tablet collection.
Health Monitoring Based on Wearable Sweat Sensors
Wearable health monitoring devices can potentially change healthcare through continuous monitoring of people’s physiological conditions. There are a variety of chemical biomarkers in human body fluids that carry vast amounts of information linked to health conditions. For example, sweat contains biomarkers that indicate conditions such as lung infections, depression, and tuberculosis. Unfortunately, the only way that doctors and patients have access to this data today is via lab tests.
We will explore engineering a wearable that features skin-compatible chemical sensors. Our sweat sensor is based on organic transistors that are engineered to be skin-conformable — an ideal candidate for a wearable device laminated on the user’s skin. This will enable real time continuous sampling of biomarkers from sweat, which could open new avenues for medicine. Furthermore, the analysis of chemical biomarkers, such as the 220 proteins that can be found in sweat, affords a diverse and complex data space. Unlike more straightforward health data, such as traditional heart monitors, the vast amount of data that is captured in a sweat sensor requires a complex analysis, which we will explore by automated machine analysis.
An anvi’o workshop at the University of Chicago
A. Murat Eren, Medicine
Advances in molecular tools and sequencing chemistry have turned every corner of biology into a ‘data-enabled’ discipline. Microbiology, the study of the most diverse and numerous form of life, has been caught in this revolution. Tremendous amounts of new data offer detailed snapshots of naturally-occurring microbial life and promise new insights into the microbial processes that make our planet tick. But the unprecedented access to endless data streams comes with a price. On the one hand, an understanding of the new ‘omics approaches and skills in computation have become de facto necessities in the journey of microbiologists. On the other hand, limited educational and training opportunities, in addition to common technical shortcomings of available software solutions, have confined a large fraction of data-enabled microbiological investigations to “what is doable” given the existing tools, rather than “what is possible” given the data. To bridge these gaps my group has developed anvi’o, a platform that facilitates state-of-the-art ‘omics analyses. Anvi’o comes with a steep learning curve but offers microbiologists full control of their data. This workshop will train a diverse group of microbiologists on anvi’o while understanding their needs in an active learning environment.