Skip to main content
  • Description

    Autumn 2021 Workshop Agenda (as of 10/26/21)

    The Rising Stars in Data Science workshop at the University of Chicago focuses on celebrating and fast tracking the careers of exceptional data scientists at a critical inflection point in their career: the transition to postdoctoral scholar, research scientist, industry research position, or tenure track position. The Autumn 2021 workshop will build upon the success of the virtual January 2021 workshop, and will showcase the exciting, innovative data science initiatives at UChicago and provide PhD students and postdocs the opportunity to plug into these networks, platforms, and opportunities. The workshop also aims to increase representation and diversity in data science by providing a platform and a supportive mentoring network to navigate academic careers in data science. Women and underrepresented minorities in computing are especially encouraged to apply. 

    The two-day research workshop will feature career and research panels, networking and mentoring opportunities, and research talks from the Rising Stars. Participants will gain insights from faculty panels on career development questions such as: how to start your academic career in data science; how to strategically sustain your career through research collaborations, publications, and skill development; and how to form meaningful interdisciplinary collaborations in data science with industry and government partners. Participants will also hear inspiring keynote talks from established, cutting-edge leaders in data science.

    Workshop Dates: November 11-12th, 2021

    Eligibility & Guidelines: 

    If you have any questions about your eligibility, please feel free to send an email to cdac@uchicago.edu.

    • Applicants must be full time graduate students within ~1-2 years of obtaining a PhD, or a current postdoctoral scholar, fellow, or researcher.
    • We welcome applicants from a wide variety of fields and backgrounds: any eligible PhD or postdoc who is engaging in rigorous, data-driven inquiry is encouraged to apply.
    • Applicants both from and outside of the University of Chicago are encouraged to apply.
    • Applicants may only submit one application.
    • Applicants may have nominations (via Google Form on application) from a maximum of 2 faculty members or advisors.

    Workshop Format

    • Rising Star research talks
    • Panels (career development, data science research)
    • Keynote address
    • 1:1 meetings with faculty members
    • Networking within the UChicago data science ecosystem

    This convening is open to all invitees who are compliant with UChicago vaccination requirements and, because of ongoing health risks, particularly to the unvaccinated, participants are expected to adopt the risk mitigation measures (masking and social distancing, etc.) appropriate to their vaccination status as advised by public health officials or to their individual vulnerabilities as advised by a medical professional. Public convening may not be safe for all and carries a risk for contracting COVID-19, particularly for those unvaccinated. Participants will not know the vaccination status of others and should follow appropriate risk mitigation measures.”

    Campus Visitor Information & COVID Policy

  • Rising Stars Profiles

    View alumni Rising Stars profiles here.

    Autumn 2021 Rising Stars Cohort

    Bio: Maria Antoniak is a PhD candidate in Information Science at Cornell University. Her research focuses on unsupervised natural language processing methods and applications to computational social science and cultural analytics. Her work translates methods from natural language processing to insights about communities and self-disclosure by modeling personal experiences shared in online communities. She has a master’s degree in computational linguistics from the University of Washington and a bachelor’s degree in humanities from the University of Notre Dame, and she has completed research internships at Microsoft, Facebook, Twitter, and Pacific Northwest National Laboratory.

    Talk Title: Modeling Personal Experiences Shared in Online Communities

    Talk Abstract: Written communications about personal experiences—and the emotions, narratives, and values that they contain—can be both rhetorically powerful and statistically difficult to model. The first goal of my research is to use natural language processing methods to represent complex personal experiences and self-disclosures communicated in online communities. Two fruitful sites for this research are online communities grounded in structured cultural experiences (books, games) and online communities grounded in healthcare experiences (childbirth, contraception, pain management). These communities situate personal opinions and stories in social contexts of reception, expectation, and judgment. The second goal of my research is critical re-examination of measurement methods: I probe models designed for traditional natural language processing tasks involving large, generic datasets by exploring their results on small, socially-specific datasets that are popular in cultural analytics and computational social science.

    Bio: Arjun studies the security of machine learning systems, with a focus on adversarial and distributed learning. His work has exposed new vulnerabilities in learning algorithms, along with the development of a theoretical framework to analyze them. He was a finalist for the 2020 Bede Liu Best Dissertation Award, and won the 2019 Yan Huo *94 Graduate Fellowship and 2018 SEAS Award for Excellence at Princeton University. He received the 2018 Siemens FutureMakers Fellowship in Machine Learning, and was a finalist for the 2017 Bell Labs Prize. He is currently a postdoctoral scholar at UChicago with Ben Zhao and Nick Feamster.

    Talk Title: The Role of Data Geometry in Adversarial Machine Learning

    Talk Abstract: Understanding the robustness of machine learning systems has become a problem of critical interest due to their increasing deployment in safety critical systems. Of particular interest are adversarial examples, which are maliciously pertrubed test-time examples designed to induce misclassification. Most research on adversarial examples has focused on developing better attacks and ad hoc defenses, resulting in an attacker-defender arms race.

    In this talk, we will step away from this paradigm and show how fundamental bounds on learning in the presence of adversarial examples can be obtained by viewing the problem through an information-theoretic lens. For fixed but arbitrary distributions, we demonstrate lower bounds on both the 0-1 and cross-entropy losses for robust learning. We compare these bounds to the performance of state-of-the-art robust classifiers and analyze the impact of different layers on robustness.

    Bio: Lingjiao Chen is a PhD candidate in the computer sciences department at Stanford University. He is broadly interested in machine learning, data management and optimization. Working with Matei Zaharia and James Zou, he is currently exploring the fast-growing marketplaces of artificial intelligence and data. His work has been published at premier conferences and journals such as ICML, NeurIPS, SIGMOD and PVLDB, and partially supported by a Google fellowship.

    Talk Title: Understanding and Exploiting Machine Learning Prediction APIs

    Talk Abstract: Machine Learning (ML) prediction APIs are a fast-growing industry and an important part of ML as a service. For example, one could use Google prediction API to classify an image for $0.0015 or to classify the sentiment of a text passage for $0.00025. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data.

    In this talk, I will present FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition. Across various tasks, FrugalML can achieve up to 90% cost reduction while matching the accuracy of the best single API, or up to 5% better accuracy while matching the best API’s cost. If time permits, I will also discuss recent follow-up studies on API performance shifts and multi-label APIs.

    Bio:Amrita Roy Chowdhury is a PhD student at the University of Wisconsin-Madison and is advised by Prof. Somesh Jha. She completed her Bachelor of Engineering in Computer Science from the Indian Institute of Engineering Science and Technology, Shibpur where she was awarded the President of India Gold Medal. Her work explores the synergy between differential privacy and cryptography through novel algorithms that expose the rich interconnections between the two areas, both in theory and practice. She has been recognized as a Rising Star in EECS at MIT, 2021 and UC Berkeley, 2020, and a 2021 Facebook Fellowship finalist. She has also been awarded the 2021 CRA/CCC Computing Innovation Fellowship.

    Talk Title: Crypt$\epsilon$: Crypto-Assisted Differential Privacy on Untrusted Servers

    Talk Abstract: Differential privacy (DP) is currently the de-facto standard for achieving privacy in data analysis, which is typically implemented either in the ”central” or ”local” model. The local model has been more popular for commercial deployments as it does not require a trusted data collector. This increased privacy, however, comes at the cost of utility and algorithmic expressibility as compared to the central model.

    In this talk, I will be presenting Crypt$\epsilon$, a system and programming framework that (1) achieves the accuracy guarantees and algorithmic expressibility of the central model (2) without any trusted data collector like in the local model. Crypt$\epsilon$ achieves the ”best of both worlds” by employing two non-colluding untrusted servers that run DP programs on encrypted data from the data owners. In theory, straightforward implementations of DP programs using off-the-shelf secure multi-party computation tools can achieve the above goal. However, in practice, they are beset with many challenges like poor performance and tricky security proofs. To this end, Crypt$\epsilon$ allows data analysts to author logical DP programs that are automatically translated to secure protocols that work on encrypted data. These protocols ensure that the untrusted servers learn nothing more than the noisy outputs, thereby guaranteeing DP for all Crypt$\epsilon$ programs. Crypt$\epsilon$ supports a rich class of DP programs that can be expressed via a small set of transformation and measurement operators followed by arbitrary post-processing. Further, I will talk about a novel performance optimization that leverages the fact that the output is noisy. Consequently, Crypt$\epsilon$ achieves performance that is practical for real-world usage.

    Bio: Xiaoan Ding is a Ph.D. candidate in the Department of Computer Science at the University of Chicago, advised by Prof. Kevin Gimpel. Her interest lies in innovating machine learning methods to natural language processing and applying the deep learning approach in language applications. Her research seeks to build data-efficient, resilient, fair, trusted models for text classification and generation, with her Ph.D. work focusing on developing models and algorithms spanning these directions. In the past, she’s interned at Microsoft Research NLP group working on hallucination detection, Amazon Alexa AI on neural information retrieval, and Google dialogue group on task-oriented dialogue systems.

    Talk Title: Data-Efficient Text Classifier for Robust NLP

    Talk Abstract: With the unprecedented progress in deep learning architectures, large-scale training, and learning algorithms, pretrained models have become pivotal in AI. Concurrently, the definition of model robustness has transited to broader aspects: data-efficiency, model resilience, fairness, and faithfulness. In this talk, I will focus on data-efficient and model resilience aspects and present my efforts to build robust text classifiers where we introduced discrete latent variables into the generative story. In modeling we parameterized the distributions using standard neural architectures used in conditional language modeling. Our training objective combines generative pretraining and discriminative finetuning. The results shows that our generative classifiers outperform discriminative baselines including BERT-style models across several challenging experimental settings.

    Bio: I am a postdoctoral researcher in the Department of Statistics at Harvard University. My research interests lie at the intersection of high-dimensional statistics and applied probability. Currently, I am excited about understanding phase transitions, universality, and computational-statistical gaps in high-dimensional inference problems. Before joining Harvard, I obtained a Ph.D. in Statistics from Columbia University and a B.Tech. in Electrical Engineering from the Indian Institute of Technology, Delhi. I received the Edward Prince Goldman Scholarship in Science from The New York Community Trust in 2021 for my Ph.D. dissertation work.

    Talk Title: High-Dimensional Asymptotics for Phase Retrieval with Structured Sensing Matrices.

    Talk Abstract: Phase Retrieval is the problem of recovering an unknown complex-valued signal vector from the magnitudes of several linear measurements. This problem arises in applications like X-ray crystallography, where it is infeasible to acquire the phase of the measurements. In this talk, I will describe some results regarding the analysis of this problem in the high-dimensional asymptotic regime where the number of measurements and the signal dimension diverge proportionally so that their ratio remains fixed. The measurement mechanism in phase retrieval is specified by a sensing matrix. A limitation of existing high-dimensional analysis of this problem is that they model this matrix as a random matrix with independent and identically distributed (i.i.d.) Gaussian entries. In practice, this matrix is highly structured with limited randomness. I will describe a correction to the i.i.d. sensing model, known as the sub-sampled Haar sensing model, which faithfully captures a crucial orthogonality property of realistic sensing matrices. For the Haar sensing model, I will present a precise asymptotic characterization of the performance of commonly used spectral estimators for solving the phase retrieval problem. This characterization can be leveraged to tune certain parameters involved in the spectral estimator optimally. The resulting estimator is information-theoretically optimal. Next, I will describe an empirical universality phenomenon: the performance curves derived for the Haar model accurately describe the observed performance curves for realistic sensing matrices. Finally, I will present recent progress towards obtaining a theoretical understanding of this universality phenomenon that causes practical sensing matrices to behave like Haar sensing matrices.

    Bio:Shi Feng is a postdoc fellow at the University of Chicago working on human-in-the-loop and interpretable NLP. Recently, he is focused on investigating the role of interpretability in the alignment of NLP systems. He holds a PhD from University of Maryland, supervised by Jordan Boyd-Graber.

    Talk Title: Towards AIs That Help Humans Work Better

    Talk Abstract: This talk focuses on developing machine learning models that are maximally useful to humans. Our primary goal is to improve the productivity of human-AI cooperation on important decision making problems by understanding how human and AI interact. In the traditional approach to machine learning, humans are treated as either rivals or teachers. However, machine learning can make up for some of the shortcomings of humans. Treating humans as collaborators opens up several new directions of research.

    In the first part of the talk, we use flashcard learning as a testbed and study how human productivity can benefit from consuming information generated by machine learning models. In the second part, we consider humans as decision makers, and investigate how explanations of machine learning predictions can improve the performance of human-AI teams on sequential decision making problems. Finally, we study the limitations of natural language explanations for model predictions, as well as novel methods to improve them.

    Bio: Anjalie is a PhD candidate at Carnegie Mellon University, where she is advised by Yulia Tsvtekov. Her work focuses on the intersection of NLP and computational social science, including using NLP models to examine social issues like propaganda, stereotypes, and prejudice. She has presented her work in NLP and interdisciplinary conferences, receiving a nomination for best paper at SocInfo 2020, and she is also the recipient of a NSF graduate research fellowship and a Google PhD fellowship. Prior to graduate school, she received her undergraduate degree in computer science, with minors in Latin and ancient Greek, from Princeton University.

    Talk Title: Building Language Technologies for Analyzing Online Activism

    Talk Abstract: While recent advances in natural language processing (NLP) have greatly enhanced our ability to analyze online text, distilling broad social-oriented research questions into tasks concrete enough for NLP models remains challenging. In this work, we develop state-of-the-art NLP models grounded in frameworks from social psychology in order to analyze two social media movements: online media coverage of the #MeToo movement in 2017-2018 and tweets about #BlackLivesMatter protests in 2020. In the first part, we show that despite common perception of the #MeToo movement as empowering, media coverage of events often portrayed women as sympathetic but unpowerful. In the second, we show that positive emotions like hope and optimism are prevalent in tweets with pro-BlackLivesMatter hashtags and significantly correlated with the presence of on-the-ground protests, whereas anger and disgust are not. These results contrast stereotypical portrayals of protesters as perpetuating anger and outrage. Overall, our work provides insight into social movements and debunks harmful stereotypes. We aim to bridge the gap between NLP, where models are often not designed to address social-oriented questions, and computational social science, where state-of-the-art NLP has often been underutilized.

    Bio: Neil Gaikwad is a doctoral scholar at MIT, specializing in human-centered AI & public policy for sustainable systems. His research focuses on the design of data-intensive social computations to inform equitable public policy decisions underpinning sociotechnical systems in low-resource environments. This scholarship has been published in premier AI & HCI conferences and featured in the New York Times, New Scientist, WIRED, Wall Street Journal. Neil’s research, teaching, leadership, and commitment to DEI have been recognized with Facebook Ph.D. Fellowship, William Asbjornsen Albert Memorial MIT Science & Engineering Fellowship, MIT Graduate Teaching Award, and Karl Taylor Compton Prize-MIT’s highest student award.

    Talk Title: Human-centered AI and Public Policy focused on Equitable Designs of Sociotechnical Systems for Global Inclusion

    Talk Abstract: Climate change, poverty, growing inequity, and declining common-pool resources have led to critical breakdowns in sociotechnical systems such as markets, food, and urban systems. These systems have been instrumental for the sustainability of our society and planet. Today, they fail to adapt to growing socioeconomic and environmental threats, disproportionately endangering historically disadvantaged communities worldwide. For instance, extreme breakdowns in food and market systems have led to over 300000 farmer suicides in the last two decades. While data-intensive computing can transform designs of the systems and policymaking processes for improving the livelihood of underserved communities, their prejudiced designs reinforce colonial legacies, further amplifying existing inequalities. In this talk, I present a novel research program focused on the equitable design, engineering, and governance of human-centered AI and social computations to inform just public policy decisions underpinning sociotechnical systems in low-resource environments. I will introduce community-based design principles to develop and deploy data-intensive sociotechnical systems (e.g., just markets for mitigating the risk of climate extremes on vulnerable farmers) that harness human and machine intelligence for resilience, sustainability, and equity. This scholarship has led to a Data-driven Humanitarian Mapping Research and Policy Initiative that brings together stakeholders from industry, academia, NGOs, and governments to tackle overarching sustainability challenges. The research program addresses a notable paucity of community-led design research in data science and public policy.

    Bio: Sainyam Galhotra is a CI postdoctoral fellow at University of Chicago. He received his Phd from University of Massachusetts Amherst. Previously, he was a researcher at Xerox Research and received his Bachelor’s degree in computer science from Indian Institute of Technology, Delhi. His research is broadly in the area of data management with a specific focus on designing algorithms to not only be efficient but also transparent and equitable in their decision-making capabilities. He is a recipient of the Best Paper Award in FSE 2017 and Most Reproducible Paper Award in SIGMOD 2017 and 2018. He is a DAAD AInet Fellow and the first recipient of the Krithi Ramamritham Award at UMass for contribution to database research.

    Talk Title: Designing a Privacy-aware Fair Trade Marketplace for Data

    Talk Abstract: A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. The increased availability of data has fueled the importance of designing effective techniques for data sharing which are not only scalable to large scale datasets, but also transparent and equitable in their decision-making capabilities.  My research focuses on these different facets of data science and are aimed towards designing a fair trade marketplace as a novel data sharing paradigm that can address the unique challenges in the path of meaningful and equitable commoditization of data. In this talk, I will present a multi-pronged approach to deal with different challenges. First, I will present a novel data discovery system that constructs bespoke datasets that satisfy user requirements. Second, I will present the challenges of deploying data marketplaces in practice and ways to mitigate them to maintain robustness of market design against adversarial attacks from different entities (buyers or sellers). Third, I will present a suite of techniques to ensure transparency and inculcate trust of the involved entities.

    Bio: Mengdi Huai is a Ph.D. candidate in the Department of Computer Science at the University of Virginia, advised by Professor Aidong Zhang. Her research interests are in the general area of data mining and machine learning, with an emphasis on the aspects of model transparency, security, privacy and algorithm design. Mengdi’s research has been published in international conferences and journals, including top conferences in data mining and AI (KDD, AAAI, IJCAI, NeurIPS, WWW, ICDM, SDM, BIBM) and top journals (TKDD, NanoBioscience). She has received multiple awards, including the Rising Star in EECS at MIT, the John A. Stankovic Research Award, the Sture G. Olsson Fellowship in Engineering, and the Best Paper Runner-up for KDD2020.

    Talk Title: Malicious Attacks on Interpretable Deep Reinforcement Learning

    Talk Abstract: The past years have witnessed the rapid development of deep reinforcement learning (DRL), which incorporates deep learning into the solution and makes decisions from unstructured input data without manual engineering of the state space. However, the adoption of deep neural networks makes the decision-making process of DRL opaque and lacking transparency. Motivated by this, various interpretation methods for DRL have been proposed. Those interpretation methods make an implicit assumption that they are performed in a reliable and secure environment. However, given their data-driven nature, these DRL interpretation methods themselves are potentially susceptible to malicious manipulations. In spite of the prevalence of malicious attacks, there is no existing work studying the possibility and feasibility of malicious attacks against DRL interpretations. To bridge this gap, in my work, I investigated the vulnerability of DRL interpretation methods. Specifically, I introduced the first study of the adversarial attacks against DRL interpretations, and proposed an optimization framework based on which the optimal adversarial attack strategy can be derived. In addition, I also studied the vulnerability of DRL interpretation methods to the model poisoning attacks, and present an algorithmic framework to rigorously formulate the proposed model poisoning attack. Finally, I conducted both theoretical analysis and extensive experiments to validate the effectiveness of the proposed malicious attacks against DRL interpretations.

    Bio:Haojian Jin is a final year Ph.D. student in the Human-Computer Interaction Institute at Carnegie Mellon University, advised by Jason Hong and Swarun Kumar. Haojian’s research explores new software architecture and toolkits that make it easier for users, developers, and auditors to protect users’ privacy. His work has been recognized with a UbiComp Gaetano Borriello Outstanding Student Award, Research Highlights at Communication of ACM and GetMobile, and best paper awards at Ubicomp and ACM Computing Reviews.

    Talk Title: My Data is None of Your Business: Separation of Concerns for Privacy through Modular Privacy Flows.

    Talk Abstract: This wide-scale deployment of tiny sensors, coupled with improvements in recognition and data mining algorithms, will enable numerous new applications for personal and societal benefits. But, we have also seen many undesired data-driven applications deployed, such as price discrimination, shopping behavior persuasion. Once one’s data is out of users’ direct control, it may potentially be used at places and times far removed from its original context. How can we computer scientists assure users that a data-driven world is the one everyone wants to live in?

    In this talk, I will introduce my thesis work on separating concerns for privacy through a new software design pattern, named Modular Privacy Flows. Rather than continuing to build privacy support in an ad-hoc manner, my research demonstrates how we can separate the privacy logic from the application logic. This separation can help users gain independent and unified control of their data while reducing the burdens of developers and auditors on ensuring privacy.

    Bio: I am a Postdoc working with Robert Nowak at the University of Wisconsin. Previously, I was a Postdoc at the Paul G. Allen School of Computer Science & Engineering at the University of Washington under Kevin Jamieson. I completed my PhD in the Electrical Engineering and Computer Science Department at the University of Michigan where my advisor was Clayton Scott. Prior to that, I double-majored in mathematics and philosophy at the University of Chicago. My research focuses on pure exploration multi-armed bandits, recommender systems, and nonparametric estimation. I am also interested in applications of machine learning that promote the social good. As a Data Science for Social Good fellow at the University of Chicago in 2015, I helped develop the Legislative Influence Detector.

    Talk Title: Practical Algorithms for Interactive Learning with Generic Function Classes

    Talk Abstract: We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms cannot be minimax-optimal in the realizable setting. Hence, we design novel algorithms for the realizable setting that are nearly minimax optimal, computationally efficient, and general-purpose, accommodating a wide variety of function classes including kernel methods, Holder smooth functions, and convex functions.  The sample complexities of our algorithms can be quantified in terms of well-known quantities like the extended teaching dimension and haystack dimension. However, unlike algorithms based directly on those combinatorial quantities, our algorithms are computationally efficient. To achieve computational efficiency, our algorithms sample from the version space using Monte Carlo “”hit-and-run” algorithms instead of maintaining the version space explicitly. Our approach has two key strengths. First, it is simple, consisting of two unifying, greedy algorithms. Second, our algorithms have the capability to seamlessly leverage prior knowledge that is often available and useful in practice. In addition to our new theoretical results, we demonstrate empirically that our algorithms are competitive with and in some cases outperform Gaussian process UCB methods. This talk is based on work to appear in NeurIPS 2021.

    Bio: Aditi Krishnapriyan is the 2020 Alvarez Fellow in Computing Sciences at Lawrence Berkeley National Laboratory and UC Berkeley. Her research interests include combining domain-driven scientific mechanistic modeling with data-driven machine learning methodologies to accelerate and improve spatial and temporal modeling. Previously, she received a PhD at Stanford University, supported by the Department of Energy Computational Science Graduate Fellowship. During her PhD, she also spent time working on machine learning research at Los Alamos National Laboratory, Toyota Research Institute, and Google Research.

    Talk Title: Integrating machine learning with physics-based spatial and temporal modeling

    Talk Abstract: Deep learning has achieved great success in numerous areas, and is also seeing increasing interest in scientific applications. However, challenges still remain: scientific phenomena are difficult to model, and can also be limited by a lack of training data. As a result, scientific machine learning approaches are being developed by incorporating domain knowledge into the machine learning process to enable more accurate and general predictions. One such popular approach, colloquially known as physics-informed neural networks (PINNs), incorporates domain knowledge as soft constraints on an empirical loss function. I will discuss the challenges associated with such an approach, and show that by changing the learning paradigm to curriculum regularization or sequence-to-sequence learning, we can achieve significantly lower error. Another approach, colloquially known as ODE-Nets, aims to couple dynamical systems/numerical methods with neural networks. I will discuss how exploiting techniques from numerical analysis for these systems can enable learning continuous, function-to-function mappings for scientific problems.

    Bio:Amanda Kube is a Ph.D. Candidate in the Division of Computational and Data Sciences at Washington University in St. Louis working with Dr. Sanmay Das in the Department of Computer Science and Dr. Patrick Fowler in the Brown School. She received her B.S. in Psychological and Brain Sciences and Mathematics with a concentration in Statistics from Washington University in St. Louis where she also received an M.S. in Data Analytics and Statistics. Her research interests involve the intersection of computation and the social sciences. Her current work combines machine learning and human decision-making to inform fair and efficient service allocations for homeless families.

    Talk Title: Integrating Human Priorities and Data-Driven Improvements in Allocation of Scarce Homeless Services to Households in Need

    Talk Abstract: Homelessness is a major public health issue in the United States that has gained visibility during the COVID-19 pandemic. Despite efforts at the federal level, rates of homelessness are not decreasing. Homeless services are a scarce public resource and current allocation systems have not been thoroughly investigated. Algorithmic techniques excel at modeling complex interactions between features and therefore have potential to model effects of homeless services at the individual level. These models can reason counterfactually about the effects of different services on each household and resulting predictions can be used for matching households to services. The ability to model heterogeneity in treatment effects of services provides the potential for “precision public health” where allocation of services is based on data-driven predictions of which service will lead to better outcomes. I discuss the scarce resource allocation problem as it applies to homeless service delivery, and the ability to improve upon the current allocation system using algorithmic techniques. I compare prediction algorithms to each other as well as to the ability of the general public to make these decisions. As homeless services are scarce public goods, it is vital to ensure allocations are not only efficient, but fair and ethical. I discuss efforts to ensure fair decisions and to understand how people prioritize households who should receive scarce homeless services. I also discuss future work and next steps as well as policy implications.

    Bio: Lihua Lei is a postdoctoral researcher in Statistics at Stanford University, advised by Professor Emmanuel Candès. His current research focuses on developing rigorous statistical methodologies for uncertainty quantification and calibration. Prior to joining Stanford, he obtained his Ph.D. in statistics at UC Berkeley, working on causal inference, multiple hypothesis testing, network analysis, stochastic optimization, and econometrics. His personal website is https://lihualei71.github.io/. 

    Talk Title: Distribution-Free Assessment of Population Overlap in Observational Studies

    Talk Abstract: Overlap in baseline covariates between treated and control groups, also known as positivity or common support, is one of the most fundamental assumptions in observational causal inference. Assessing this assumption is often ad hoc, however, and can give misleading results. For example, the common practice of examining the empirical distribution of estimated propensity scores is heavily dependent on model specification and has poor uncertainty quantification. In this paper, we propose a formal statistical framework for assessing the extrema of the population propensity score; e.g., the propensity score lies in [0.1, 0.9] almost surely. We develop a family of upper confidence bounds, which we term O-values, for this quantity. We show these bounds are valid in finite samples so long as the observations are independent and identically distributed, without requiring any further modeling assumptions on the data generating process. We also use extensive simulations to show that these bounds are reasonably tight in practice. Finally, we demonstrate this approach using several benchmark observational studies, showing how to build our proposed method into the observational causal inference workflow.

    Bio: Konstantin Mishchenko received his double-degree MSc from Paris-Dauphine and École normale supérieure Paris-Saclay in 2017. He did his PhD under the supervision of Peter Richtárik, and had research internships at Google Brain and Amazon. Konstantin has been recognized as an outstanding reviewer for NeurIPS19, ICML20, AAAI20, ICLR21, and ICML21. He has published 8 conference papers at ICML, NeurIPS, AISTATS, and UAI, 1 journal paper at SIOPT, 6 workshop papers, and co-authored 8 preprints, some of which are currently under peer review. In 2021, Konstantin is joining the group of Alexandre d’Aspremont and Francis Bach in Paris as a Postdoctoral Researcher.

    Talk Title: Optimization for Federated Learning

    Talk Abstract: Optimization has been a vital tool for enabling the success of machine learning. In the recently introduced paradigm of federated learning, where devices or organizations unite to train a model without revealing their private data, optimization has been particularly nontrivial. The peculiarities of federated learning that make it difficult include unprecedented privacy constraints, the difficulty of communication with a server, and high heterogeneity of the data across the participating parties. Nevertheless, the potential applications of federated learning, such as machine learning for health care, banking, and smartphones, have sparked global interest in the problem and quick growth in the number of publications.

    In this talk, we will discuss some of the recent advances in optimization for federated learning. We will formulate the key challenges in communication efficiency and personalization and propose ways for tackling them that are motivated by theory. To this end, we will discuss the convergence properties of some existing and new federated learning algorithms that leverage on-device (local) iterations as a way to limit communication.

    Bio: Faidra Monachou is a final-year Ph.D. candidate in Management Science and Engineering at Stanford University. She is interested in market and information design, with a particular focus on the interplay between policy design and discrimination in education and labor. Faidra’s research has been supported by various scholarships and fellowships from Stanford Data Science, Stanford HAI, Google, and other organizations. She co-chaired the MD4SG’20 workshop and co-organizes the Stanford Data Science for Social Good program. Faidra received her undergraduate degree in Electrical and Computer Engineering from the National Technical University of Athens in Greece.

    Talk Title: Discrimination, Diversity, and Information in Selection Problems

    Talk Abstract: Despite the large empirical literature on disparities in college admissions, our theoretical understanding is limited. In this talk, I will introduce a theoretical framework to study how a decision-maker concerned with both merit and diversity, selects candidate students under imperfect information, limited capacity, and legal constraints. Motivated by recent decisions to drop standardized testing in admissions, we apply this framework to study how information differences lead to disparities across equally skilled groups and quantify the trade-off between information and access in test-free and test-based policies with and without affirmative action. Using application and transcript data from the University of Texas at Austin, we illustrate that there exist practical settings where dropping standardized testing improves or worsens both merit and diversity.

    Furthermore, we extend this model to demonstrate how privilege differences lead to intra-group disparities and establish that the direction of discrimination at the observable level may differ from the unobservable level. We compare common policies used in practice and take an optimization approach to design an optimal policy under legal constraints.

    Bio: Omar Montasser is a fifth year PhD student at TTI-Chicago advised by Nathan Srebro. His main research interest is the theory of machine learning. Recently, his research focused on understanding and characterizing adversarially robust learning, and designing algorithms with provable robustness guarantees under different settings. His work has been recognized by a best student paper award at COLT (2019).

    Talk Title: What, How and When can we Learn Adversarially Robustly?

    Talk Abstract: In this talk, we will discuss the problem of learning an adversarially robust predictor from clean training data. That is, learning a predictor that performs well not only on future test instances, but also when these instances are corrupted adversarially. There has been much empirical interest in this question, and in this talk we will take a theoretical perspective and see how it leads to practically relevant insights, including: the need to depart from an empirical (robust) risk minimization approach, and thinking of what kind of accesses and reductions can allow learning.

    Bio: Jeffrey Negrea is a 5th year Ph.D. candidate and Vanier scholar at the University of Toronto in the department of Statistical Sciences, and a graduate student researcher at the Vector Institute, working with Daniel Roy on foundational problems in computational statistics, machine learning, and sequential decision making. His research focuses on questions of reliability and robustness for statistical and machine learning methods. His contributions are broad: he has recent work addressing robustness to the IID assumption in sequential decision making, the role of regularization in statistical learning, the connection between stochastic optimization and uncertainty quantification, and approximation methods in MCMC. Previously, Jeff completed his B.Math. at the University of Waterloo, and his M.Sc. in Statistics at the University of Toronto.

    Talk Title: Adapting to failure of the IID assumption for sequential prediction

    Talk Abstract: We consider sequential prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. We quantify relaxations of the classical IID assumption in terms of these constraint sets, with IID sequences at one extreme and adversarial mechanisms at the other. The Hedge algorithm, long known to be minimax optimal in the adversarial regime, was recently shown to be minimax optimal for IID data. We show that Hedge with deterministic learning rates is suboptimal between these extremes, and present new algorithms that adaptively achieve the minimax optimal rate of regret with respect to our relaxations of the IID assumption, and do so without knowledge of the underlying constraint set. We analyze our algorithm using the follow-the-regularized-leader framework, and prove it corresponds to Hedge with adaptive learning rates.

    Bio: Abhilasha is a Ph.D. student at Carnegie Mellon University, working in the Language Technologies Institute. Their research focuses on understanding neural model performance, and consequently developing robust and trustworthy NLP technologies. They have published papers in top-tier NLP conferences and have been the recipient of the outstanding reviewer awards at ACL and EMNLP. Their work has also received the “Area Chair Favorite Paper” award at COLING 2018. In the past, they interned at Allen Institute for AI and Microsoft Research, where they worked on understanding how deep learning models process challenging semantic phenomena in natural language.

    Talk Title: Developing User-Centric Models for Question Answering

    Talk Abstract: Everyday users now benefit from powerful QA technologies in a range of consumer-facing applications. Voice assistants such as Amazon Alexa or Google Home have brought natural language technologies to several million homes globally. Yet, even with millions of users now interacting with these technologies on a daily basis, there has been surprisingly little research attention devoted to studying the issues that arise when people use QA systems. Traditional QA evaluations do not reflect the needs of many users who can benefit from QA technologies. For example, users with a range of visual and motor impairments now rely extensively on voice interfaces for efficient text entry. Keeping these needs in mind, we construct evaluations considering the interfaces through which users interact with QA systems. We analyze and mitigate errors introduced by three interface types that could be connected to a QA engine: speech recognizers converting spoken queries to text, keyboards used to type queries into the system, and translation systems processing queries in other languages. Our experiments and insights present a useful starting point for both practitioners and researchers, to develop usable question-answering systems.

    Bio: Alexander Rodriguez is a Ph.D. student in Computer Science at Georgia Tech advised by Prof. B. Aditya Prakash. His research interests include data science and AI, with emphasis on time-series and real-world networks problems motivated from epidemiology and community resilience. In response to COVID-19, he has been the student lead at his research group in forecasting the progression of the pandemic, and these predictions have been featured in the CDC’s website and FiveThirtyEight.com. His work has been published in AAAI, KDD, NeurIPS, and BigData, and awarded the 1st place in the Facebook/CMU COVID-19 Challenge and the 2nd place in the C3.ai COVID-19 Grand Challenge. He also has served as workshop organizer in BPDM @ KDD 2017 and epiDAMIK @ KDD 2021.

    Talk Title: Deep Learning Frameworks for Epidemic Forecasting

    Talk Abstract: Our vulnerability to emerging infectious diseases has been illustrated with the devastating impact of the COVID-19 pandemic. Forecasting epidemic trajectories (such as future incidence over the next four weeks) gives policymakers a valuable input for designing effective healthcare policies and optimizing supply chain decisions. However, this is a non-trivial task with multiple open questions. In this talk, I will present our neural frameworks for epidemic forecasting, using seasonal influenza and the COVID-19 as examples. I will introduce our efforts in three research directions: (1) awareness of multiple facets of the epidemic dynamics, (2) coping with challenges from using public health data, and (3) readiness to provide actionable forecasts and insights. I will first discuss our deployed model for predicting COVID-associated indicators, which has been recognized as a top short-term forecasting model among all models submitting predictions to the CDC. I will also introduce how to use deep learning to adapt a historical flu model to an emerging scenario where COVID and flu coexist by leveraging auxiliary data sources. Next, I will introduce deep learning frameworks for incorporating expert-guidance, principled uncertainty quantification for well-calibrated forecasts, and handling data revisions for refining forecasts. Finally, I will share some future research directions.

    Bio: Martin Saveski is a postdoctoral scholar at the Management Science and Engineering department at Stanford University. He completed his Ph.D. from MIT in September 2020. Martin’s broad research area is Computational Social Science. He uses Causal Inference and Social Network Analyses to study pressing social problems online, such as political polarization and toxicity. He has also made methodological contributions in the areas of causal inference in networks, and recommender systems. Previously, he has interned at Facebook, LinkedIn, Amazon, and Yahoo. His work has been covered by major media outlets, including the New York Times, NPR, MIT Tech Review, and others.

    Talk Title: Engaging Politically Diverse Audiences on Social Media

    Talk Abstract: In this talk, I will present our study of how political polarization is reflected in the language used by media outlets to promote their content online and what we can do to reduce it. We tracked the Twitter posts of several media outlets over the course of more than three years (566K tweets), and the engagement with these tweets from other users (104M retweets). We then used this data to model the relationship between the tweet text and the political diversity of the audience. We built a tool that integrates our models and helps journalists craft tweets that are engaging to a politically diverse audience, guided by the model predictions. To test the real-world impact of the tool, we partnered with the award-winning PBS documentary series Frontline and ran a series of advertising experiments on Twitter testing how tens of thousands of users respond to the tweets. We found that in seven out of the ten experiments, the tweets selected by our model were indeed engaging to a more politically diverse audience, illustrating the effectiveness of our tool. I will close by discussing the methodological challenges and opportunities in using advertisements to test interventions on social media platforms.

    Bio: Liyue Shen is a PhD student at Electrical Engineering Department of Stanford University, co-advised by Professor John Pauly and Professor Lei Xing. Her research focuses on Medical AI, which spans the interdisciplinary research of machine learning, computer vision and medical imaging to deepen our understanding of human health and improve image-guided clinical care, especially for solving real-world problems in cancer patients treatment and radiation therapy. Her work has been published in both conferences (ICCV, CVPR) and journals (Nature Biomedical Engineering, IEEE TMI). She is the recipient of Stanford Bio-X Bowes Graduate Student Fellowship (2019-2021).

    Talk Title: Exploiting Prior Knowledge in Physical World Incorporated with Machine Learning for Solving Medical Imaging Problems

    Talk Abstract: Medical imaging is crucial for image-guided clinical patient care. In my research of the interdisciplinary area in medical AI, I develop efficient machine learning algorithms for medical imaging by exploiting prior knowledge from the physical world — exploit what you know — to incorporate with machine learning models.

    I present two main directions of my research. First, since the data-driven machine learning methods always suffer from limitations in generalizability, reliability and interpretability, By exploiting geometry and physics priors from the imaging system, I proposed physics-aware and geometry-informed deep learning frameworks for radiation-reduced sparse-view CT and accelerated MR imaging. Incorporating geometry and physics priors, the trained deep networks show more robust generalization across patients and better interpretability. Second, motivated by the unique characteristics of medical images that patients are often scanned serially over time during clinical treatment, where earlier images provide abundant prior knowledge of the patient’s anatomy, I proposed a prior embedding method to encode internal information of image priors through coordinate-based neural representation learning. Since this method requires no training data from external subjects, it relaxes the burden of data collection, and can be easily generalized across different imaging modalities and anatomies. Following this, I developed a novel algorithm of temporal neural representation learning for longitudinal study. Combining both physics priors and image priors, I showed proposed algorithm can successfully capture subtle yet significant structure changes such as tumor progression in sparse-sampling image reconstruction, which can be applied to tackle real-world challenges in cancer patients treatment and radiation therapy.

    Bio: Guanya Shi received a B.E. in mechanical engineering (summa cum laude) from Tsinghua University in 2017. He is currently working toward a Ph.D. degree in computing and mathematical sciences at the California Institute of Technology. He was a deep learning research intern at NVIDIA in 2020. His research interests are centered around the intersection of machine learning and control theory, spanning the entire spectrum from theory and foundation, algorithm design, to solve cutting-edge problems and demonstrate new capabilities in robotics and autonomy. Guanya was the recipient of several awards, including the Simoudis Discovery Prize and the WAIC Yunfan Award.

    Talk Title: Safety-Critical Learning and Control in Dynamic Environments: Towards Unified Theory and Learned Robotic Agility

    Talk Abstract: Deep-learning-based methods have made exciting progress in many decision-making problems such as playing complicated strategy games. However, for complex real-world settings, such as agile robotic control in hazardous or poorly-sensed environments (e.g., autonomous driving), end-to-end deep-learning-based methods are often unreliable. In this talk, I will first present the Neural-Control Family, which is a family of nonlinear deep-learning-based control methods with stability, safety, and robustness guarantees. The Neural-Control Family bridges learning and control theory in a unified framework, and demonstrates new capabilities in agile robot control (e.g., agile flight maneuvers in unknown strong wind conditions). In the second part, I will discuss progress towards establishing clean interfaces that fundamentally connect learning and control. A strong focus will be on non-asymptotic analysis for online learning and control. In particular, we will discuss the intersection of representation learning and adaptive control, no-regret and competitive control, and safe exploration in dynamical systems.

    Bio: Tyler is a Ph.D. candidate in Computer Science at the University of Chicago, advised by Kyle Chard and Ian Foster. His research interests lie at the intersection of data management, data science, and HPC, focusing on enabling scientists to maximize the utility of massive amounts of data. His work has culminated in the design of the open-source system Xtract that can intelligently formulate metadata extraction workflows for data stored in heterogeneous file formats across leadership-scale computing facilities. Before joining the University of Chicago, he received his B.A. in Applied Mathematics and Statistics from Macalester College.

    Talk Title: Enabling Data Utility Across the Sciences

    Talk Abstract: Scientific data repositories are generally chaotic—files spanning heterogeneous domains, studies, and users are stuffed into an increasingly-unsearchable data swamp without regard for organization, discoverability, or usability. Files that could contribute to scientists’ future research may be spread across storage facilities and submerged beneath petabytes of other files, rendering manual annotation and navigation virtually impossible. To remedy this lack of navigability, scientists require a rich search index of metadata, or data about data, extracted from individual files. In this talk, we will explore automated metadata extraction workflows for converting dark data swamps into navigable data collections, given no prior knowledge regarding each file’s schema or provenance. I enable such extraction from files of vastly different structures by building a robust suite of “extractors” that leverage data scientific methods (e.g., keyword analysis, entity recognition, and file type identification) in order to maximize our body of knowledge about a diversity of files.

    In this talk, I outline the construction, optimization, and evaluation of Xtract—a scalable metadata extraction system—that automatically constructs extraction plans for files distributed across remote cyberinfrastructure. I illustrate the scale challenges in processing these data, and outline techniques to maximize extraction throughput, by analyzing Xtract’s performance on four real science data sets. Finally, I will present early results of a user study in which I directly evaluate the extent to which automatically extracted metadata enables data utility in scientific research processes.

    Bio: Jennifer is a PhD candidate in Computing and Mathematical Sciences at Caltech, advised by Pietro Perona and Yisong Yue. She is interested in studying methods to integrate domain knowledge with machine learning approaches for scientific applications. Her current work is at the intersection of machine learning and behavior analysis, with projects on learning behavioral representations, social behavior recognition, interpretable modeling, and keypoint discovery. With the Kennedy Lab at Northwestern, she organized an interdisciplinary workshop on behavior modeling to connect researchers across science and machine learning. Her work is supported by the Kortschak Scholars Program and a NSERC Fellowship.

    Talk Title: AI for Science: Learning from Experts and Data

    Talk Abstract: In many fields, the amount of recorded scientific data is increasing much faster than the speed at which researchers can analyze and interpret them. For example, recorded videos of animal behavior over a few days can take domain experts months to analyze. Innovations in data science, such as machine learning, provide a promising direction to enable scientists to scalably perform data-driven experiments. However, scientific applications raise a number of challenges for existing methods: data creation is expensive, model interpretability is important, and tools are often needed to translate algorithmic improvements to practical benefits.

    To address these challenges, my current work has focused on incorporating domain knowledge into machine learning to reduce human effort for data analysis. I will discuss methods to improve the sample-efficiency and interpretability of models in the context of behavior modeling. To learn annotation-sample efficient representations, we developed a framework to unify self-supervision with weak programmatic supervision from domain experts. We demonstrated that our method reduces annotation requirements up to a factor of 10 without compromising accuracy, compared to previous approaches. Furthermore, we investigate program synthesis as a promising direction to produce interpretable descriptions of behavior. We integrate interpretable programs from our method with an existing tool in behavioral neuroscience. These interdisciplinary approaches of machine learning with experts in the loop are important to broaden the application of data science across scientific domains.

    Bio:Wei is a PhD candidate in the Department of Computer Science & Engineering at the Washington University in St. Louis, advised by Chien-Ju Ho. His research interests are in online learning, algorithmic economics, optimization, and behavioral experiments, with a focus on developing theoretically rigorous, empirically grounded frameworks to understand and design human-centered algorithms. He received the B.E. degree from Tianjin University in 2017.

    Talk Title: Learning with Understanding: Human Behavior in Algorithm Design

    Talk Abstract: Algorithms increasingly pervade every sphere of human life and thus have great potential to reshape various sectors of our modern society. Thus, it is important to understand the role humans play in the design of algorithm. However, human involvement also creates unique challenges. Humans might be careless, strategic, or have behavioral biases.


    In this talk, I will present two works from my own research on theoretically and empirically dealing with these challenges when humans are involved in algorithm design. First, I will describe the problem on learning with human biased behavior. In this problem, the learner cannot directly observe the realized reward of an action but can only observe human biased feedback on the realized reward. I explored two natural human feedback models. Our results show that a small deviation on user behavior model and/or the design of the information structure could significant impact the overall system outcome.

    I then step back and examine whether the standard behavior models capture human behavior in practice by utilizing behavioral experiments. I studied this question in AI-assisted decision-making where AI intelligently abstracts out useful information from a large amount of data. Human then review the information output by the AI and make the decision. I have run behavior experiments to characterize human’s response in practice and established an empirically grounded human behavior model.

    Bio: I am postdoctoral researcher in the Machine Learning Foundations group at Microsoft Research Redmond. I received my PhD from the EECS department in MIT, advised by Prof. Piotr Indyk, in September 2020. Before that I completed my MSc in computer science and mathematics from the Weizmann Institute, advised by Prof. Uriel Feige, and BSc in computer science and mathematics from the Technion – Israel Institute of Technology. During my PhD I spent time as a research intern in Microsoft, Amazon and VMware.

    Talk Title: On the Role of Data in Algorithm Design

    Talk Abstract: Recently, there has been a growing interest in harnessing the power of big datasets and modern machine learning for designing new scalable algorithms. This invites us to rethink the role of data in algorithm design: not just the input to pre-defined algorithms, but also a factor that enters the algorithm design process itself, driving it in a strong and possibly automated manner. In this talk, I will describe my work on data-driven and learning-based algorithms for high-dimensional metric spaces and nearest neighbor search. In particular, I will show that data-dependence is necessary for optimal compressed representations of high-dimensional Euclidean distances, and that neural networks can be used to build better data structures for nearest neighbor search.

  • Workshop

    Workshop Agenda (as of 10/26/21)

    Speaker Bios:

    Opening Remarks (11/11)

    Rebecca Willett is a Professor of Statistics and Computer Science at the University of Chicago. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005 and was an Assistant then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. She was an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison from 2013 to 2018.  Prof. Willett received the National Science Foundation CAREER Award in 2007, was a member of the DARPA Computer Science Study Group 2007-2011, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. Prof. Willett has also held visiting researcher positions at the Institute for Pure and Applied Mathematics at UCLA in 2004, the University of Wisconsin-Madison 2003-2005, the French National Institute for Research in Computer Science and Control (INRIA) in 2003, and the Applied Science Research and Development Laboratory at GE Medical Systems (now GE Healthcare) in 2002. Her research interests include network and imaging science with applications in medical imaging, wireless sensor networks, astronomy, and social networks. She is also an instructor for FEMMES (Females Excelling More in Math Engineering and Science; news article here) and a local exhibit leader for Sally Ride Festivals. She was a recipient of the National Science Foundation Graduate Research Fellowship, the Rice University Presidential Scholarship, the Society of Women Engineers Caterpillar Scholarship, and the Angier B. Duke Memorial Scholarship.

    Homepage

    Panel: The Academic Job Search – Perspectives from Recent Hires (11/11)

    I am an Assistant Professor of Computer Science at the University of Chicago. I founded and direct 3DL (threedle! ), a group of enthusiastic researchers passionate about 3D, machine learning, and visual computing. I obtained my Ph.D. in 2021 from Tel Aviv University under the supervision of Daniel Cohen-Or and Raja Giryes.

    My research is focused on building artificial intelligence for 3D data, spanning the fields of computer graphics, machine learning, and computer vision. Deep learning, the most popular form of artificial intelligence, has unlocked remarkable success on structured data (such as text, images, and video), and I am interested in harnessing the potential of these techniques to enable effective operation on unstructured 3D geometric data.

    We have developed a convolutional neural network designed specifically for meshes, and also explored how to learn from the internal data within a single shape (for surface reconstructiongeometric texture synthesis, and point cloud consolidation) – and I am interested in broader applications related to these areas. Additional research directions that I am aiming to explore include: intertwining human and machine-based creativity to advance our capabilities in 3D shape modeling and animation; learning with less supervision, for example to extract patterns and relationships from large shape collections; and making 3D neural networks more “interpretable/explainable”.

    I am an assistant professor in the Department of Statistics at the University of Chicago.

    Previously, I was a postdoctoral researcher at UC Berkeley, advised by Professor Martin Wainwright. I obtained my Ph.D. at Princeton University in 2020, advised by Professor Yuxin Chen and Professor Jianqing Fan. Prior to graduate school, I received my bachelor’s degree in Electrical Engineering from Tsinghua University in 2015.

    I am broadly interested in mathematics of data science, reinforcement learning, high-dimensional statistics, convex and nonconvex optimization as well as their applications to neuroscience.

    Website

    Audrey Sedal is a roboticist and Research Assistant Professor at the Toyota Technological Institute at Chicago.

    Website

    Chenhao Tan is an assistant professor at the Department of Computer Science and the Department of Information Science (by courtesy) at University of Colorado Boulder. His main research interests include language and social dynamics, human-centered machine learning, and multi-community engagement. He is also broadly interested in computational social science, natural language processing, and artificial intelligence.

    Homepage.

    I was previously a Distinguished Postdoctoral Researcher in the department of statistics at Columbia University, where I worked with the groups of David Blei and Peter Orbanz. I completed my Ph.D. in statistics at the University of Toronto, where I was advised by Daniel Roy. In a previous life, I worked on quantum computing at the University of Waterloo. I won a number of awards, including the Pierre Robillard award for best statistics thesis in Canada.

    I am an assistant professor of Statistics and Data Science at the University of Chicago and a research scientist at Google Cambridge. My recent work revolves around the intersection of machine learning and causal inference, as well as the design and evaluation of safe and credible AI systems. Other noteable areas of interests include network data, and the foundations of learning and statistical inference.

    Opportunities at the Data Science Institute (11/11)

    Julia Lane is the Executive Director of the Center for Data and Computing, responsible for shaping and executing the strategic vision of CDAC, building new research partnerships and outreach strategies to foster interdisciplinary collaborations, and ensuring that the University continues to broaden applications of data science and computing approaches.

    Ningzi Li received her doctoral degree in Sociology at Cornell University. Her research focuses on organizational theory and sociology of strategy, in particular, how social and institutional factors shape firm strategies. One stream of her work investigates causes and consequences of inter-organizational networks over the course of institutional changes using big data approach. A second stream of her work examines language as an essential component and representation of strategy using natural experiments and NLP methods. She is a recipient of the best paper award from Canadian Sociological Association Economic Sociology Research Cluster, 2019.

    Her CV is here.

    Jamie Saxon will join CDAC as a postdoctoral scholar in summer 2020, and was previously a postdoctoral fellow with the Harris School of Public Policy and the Center for Spatial Data Science of the University of Chicago.

    He uses large data sources to measure the availability and use of civic and social resources in American cities. He is particularly interested in mobility among neighborhoods and the consequences of this mobility. He has also studied how gerrymandering affects representation, and developed powerful automated districting software.

    He is committed to developing resources for computational social science research, and has taught programming and statistics for masters’ students in public policy.

    He was trained as a particle physicist and was previously an Enrico Fermi Fellow on the ATLAS Experiment on CERN’s Large Hadron Collider at the Enrico Fermi Institute. He worked for many years on electronics and firmware for measuring and reconstructing particle trajectories. As a graduate student at the University of Pennsylvania, he made noteworthy contributions to the discovery and first measurements of the Higgs Boson in the two-photon channel.

    His CV is here; you can also find him on LinkedIn or GitHub.

    David Uminsky joined the University of Chicago in September 2020 as a senior research associate and Executive Director of Data Science. He was previously an associate professor of Mathematics and Executive Director of the Data Institute at University of San Francisco (USF). His research interests are in machine learning, signal processing, pattern formation, and dynamical systems.  David is an associate editor of the Harvard Data Science Review.  He was selected in 2015 by the National Academy of Sciences as a Kavli Frontiers of Science Fellow. He is also the founding Director of the BS in Data Science at USF and served as Director of the MS in Data Science program from 2014-2019. During the summer of 2018, David served as the Director of Research for the Mathematical Science Research Institute Undergrad Program on the topic of Mathematical Data Science.

    Before joining USF he was a combined NSF and UC President’s Fellow at UCLA, where he was awarded the Chancellor’s Award for outstanding postdoctoral research. He holds a Ph.D. in Mathematics from Boston University and a BS in Mathematics from Harvey Mudd College.

    Panel: Early Career Advice (11/11)

    Raul Castro Fernandez is an Assistant Professor of Computer Science at the University of Chicago. In his research he builds systems for discovering, preparing, and processing data. The goal of his research is to understand and exploit the value of data. He often uses techniques from data management, statistics, and machine learning. His main effort these days is on building platforms to support markets of data. This is part of a larger research effort on understanding the Economics of Data. He’s part of ChiData, the data systems research group at The University of Chicago.

    Homepage.

    Marshini Chetty is an assistant professor in the Department of Computer Science at the University of Chicago, where she co-directs the Amyoli Internet Research Lab or AIR lab. She specializes in human-computer interaction, usable privacy and security, and ubiquitous computing. Marshini designs, implements, and evaluates technologies to help users manage different aspects of Internet use from privacy and security to performance, and costs. She often works in resource-constrained settings and uses her work to help inform Internet policy. She has a Ph.D. in Human-Centered Computing from Georgia Institute of Technology, USA and a Masters and Bachelors in Computer Science from the University of Cape Town, South Africa. In her former lives, Marshini was on the faculty in the Computer Science Department at Princeton University and the College of Information Studies at the University of Maryland, College Park. Her work has won best paper awards at SOUPS, CHI, and CSCW and has been funded by the National Science Foundation, the National Security Agency, Intel, Microsoft, Facebook, and multiple Google Faculty Research Awards.

    Homepage.

    I enjoy developing and deploying programming language technology — type systems, synthesis algorithms, and other program analysis techniques — for applications in software engineering and human-computer interaction. Currently, the overarching theme of my research is to develop Direct Manipulation Programming Systems.

    Website

    Samantha Riesenfeld is an Assistant Professor of Molecular Engineering and of Genetic Medicine, a member of the Committee on Immunology, an Associate Member of the Comprehensive Cancer Center, and co-director of the new Computational and Systems Immunology PhD track in Immunology and Molecular Engineering. She leads an interdisciplinary research program focused on developing and applying genomics-based machine learning approaches to investigate the cellular components, transcriptional circuitry, and dynamics underlying complex biological systems, with a special interest in inflammatory immune responses and solid tumor cancer.

    Overview of the Data Science Institute (11/12)

    Dan Nicolae obtained his Ph.D. in statistics from The University of Chicago and has been a faculty at the same institution since 1999, with appointments in Statistics (since 1999) and Medicine (since 2006). His research focus is on developing statistical and computational methods for understanding the human genetic variation and its influence on the risk for complex traits, with an emphasis on asthma related phenotypes. The current focus in his statistical genetics research is centered on data integration and system-level approaches using large datasets that include clinical and environmental data as well as various genetics/genomics data types: DNA variation, gene expression (RNA-seq), methylation and microbiome.

    Homepage

    Keynote: Internet Equity & Access Research Initiative (11/12)

    Nick Feamster is Neubauer Professor in the Department of Computer Science and the College. He researches computer networking and networked systems, with a particular interest in Internet censorship, privacy, and the Internet of Things. His work on experimental networked systems and security aims to make networks easier to manage, more secure, and more available.

    Homepage

    Nicole Marwell is Associate Professor in the University of Chicago Crown Family School of Social Work, Policy, and Practice. She is also a faculty affiliate of the Department of Sociology, a faculty fellow at the Center for Spatial Data Science, and a member of the Faculty Advisory Council of the Mansueto Institute for Urban Innovation. Her research examines urban governance, with a focus on the diverse intersections between nonprofit organizations, government bureaucracies, and politics.

    Panel: Inside the Search Committee

    Michael J. Franklin is the inaugural holder of the Liew Family Chair of Computer Science. An authority on databases, data analytics, data management and distributed systems, he also serves as senior advisor to the provost on computation and data science.

    Previously, Franklin was the Thomas M. Siebel Professor of Computer Science and chair of the Computer Science Division of the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. There, he co-founded Berkeley’s Algorithms, Machines and People Laboratory (AMPLab), a leading academic big data analytics research center. The AMPLab won a National Science Foundation CISE “Expeditions in Computing” award, which was announced as part of the White House Big Data Research initiative in March 2012, and received support from over 30 industrial sponsors. AMPLab created industry-changing open source Big Data software including Apache Spark and BDAS, the Berkeley Data Analytics Stack. At Berkeley, he also served as an executive committee member for the Berkeley Institute for Data Science, a campus-wide initiative to advance data science environments.

    An energetic entrepreneur in addition to his academic work, Franklin founded and became chief technology officer of Truviso, a data analytics company acquired by Cisco Systems. He serves on the technical advisory boards of various data-driven technology companies and organizations.

    Franklin is a Fellow of the Association for Computing Machinery and a two-time recipient of the ACM SIGMOD (Special Interest Group on Management of Data) “Test of Time” award. His many other honors include the outstanding advisor award from Berkeley’s Computer Science Graduate Student Association. He received the Ph.D. in Computer Science from the University of Wisconsin in 1993, a Master of Software Engineering from the Wang Institute of Graduate Studies in 1986, and the B.S. in Computer and Information Science from the University of Massachusetts in 1983.

    Homepage

    Dan Nicolae obtained his Ph.D. in statistics from The University of Chicago and has been a faculty at the same institution since 1999, with appointments in Statistics (since 1999) and Medicine (since 2006). His research focus is on developing statistical and computational methods for understanding the human genetic variation and its influence on the risk for complex traits, with an emphasis on asthma related phenotypes. The current focus in his statistical genetics research is centered on data integration and system-level approaches using large datasets that include clinical and environmental data as well as various genetics/genomics data types: DNA variation, gene expression (RNA-seq), methylation and microbiome.

    Homepage

    David Uminsky joined the University of Chicago in September 2020 as a senior research associate and Executive Director of Data Science. He was previously an associate professor of Mathematics and Executive Director of the Data Institute at University of San Francisco (USF). His research interests are in machine learning, signal processing, pattern formation, and dynamical systems.  David is an associate editor of the Harvard Data Science Review.  He was selected in 2015 by the National Academy of Sciences as a Kavli Frontiers of Science Fellow. He is also the founding Director of the BS in Data Science at USF and served as Director of the MS in Data Science program from 2014-2019. During the summer of 2018, David served as the Director of Research for the Mathematical Science Research Institute Undergrad Program on the topic of Mathematical Data Science.

    Before joining USF he was a combined NSF and UC President’s Fellow at UCLA, where he was awarded the Chancellor’s Award for outstanding postdoctoral research. He holds a Ph.D. in Mathematics from Boston University and a BS in Mathematics from Harvey Mudd College.

  • Committee & Mentors

    Faculty Committee

    Yuxin Chen is an assistant professor at the Department of Computer Science at the University of Chicago. Previously, he was a postdoctoral scholar in Computing and Mathematical Sciences at Caltech, hosted by Prof. Yisong Yue. He received my Ph.D. degree in Computer Science from ETH Zurich, under the supervision of Prof. Andreas Krause. He is a recipient of the PIMCO Postdoctoral Fellowship in Computing and Mathematical Sciences, a Swiss National Science Foundation Early Postdoc.Mobility fellowship, and a Google European Doctoral Fellowship in Interactive Machine Learning.

    His research interest lies broadly in probabilistic reasoning and machine learning. He is currently working on developing interactive machine learning systems that involve active learning, sequential decision making, interpretable models and machine teaching. You can find more information in my Google scholar profile.

    Homepage.

    Nick Feamster is Neubauer Professor in the Department of Computer Science and the College. He researches computer networking and networked systems, with a particular interest in Internet censorship, privacy, and the Internet of Things. His work on experimental networked systems and security aims to make networks easier to manage, more secure, and more available.

    Homepage

    Risi Kondor is an Associate Professor in the Department of Computer Science, Statistics, and the Computational and Applied Mathematics Initiative at the University of Chicago. He joined the Flatiron Institute in 2019 as a Senior Research Scientist with the Center for Computational Mathematics. His research interests include computational harmonic analysis and machine learning. Kondor holds a Ph.D. in Computer Science from Columbia University, an MS in Knowledge Discovery and Data Mining from Carnegie Mellon University, and a BA in Mathematics from the University of Cambridge. He also holds a diploma in Computational Fluid Dynamics from the Von Karman Institute for Fluid Dynamics and a diploma in Physics from Eötvös Loránd University in Budapest.

    Lorenzo Orecchia is an assistant professor in the Department of Computer Science. His research focuses on applying mathematical techniques from discrete and continuous optimization to design algorithms for computational challenges arising in a variety of applications, including Machine Learning, Numerical Analysis and Combinatorial Optimization.

    Since 2018, I am an Assistant Professor in the Department of Statistics at the University of Chicago, where I am a member of the Committee on Computational and Applied Mathematics. Previously, I was a postdoctoral research associate and a member of the Data Science Initiative at Brown University. In 2016 I completed my PhD in Mathematics and Statistics at the University of Warwick, UK, under the supervision of Andrew Stuart and Gareth Roberts.

    My research interests are in graph-based learning, inverse problems and data assimilation. The main theme that drives my research across these three disciplines is the desire to blend complex preditive models with large data-sets. My work addresses both theoretical and compuational challenges motivated by data-centric applications.

    My work is currently funded by the National Science Foundation, the National Geospatial-Intelligence Agency and the BBVA Foundation. I have been awarded the 2020 José Luis Rubio de Francia prize to the best Spanish mathematician under 32 by the Spanish Royal Society of Mathematics.

    I am the organizer of the CAM Colloquium.

    Chenhao Tan is an assistant professor at the Department of Computer Science and the Department of Information Science (by courtesy) at University of Colorado Boulder. His main research interests include language and social dynamics, human-centered machine learning, and multi-community engagement. He is also broadly interested in computational social science, natural language processing, and artificial intelligence.

    Homepage.

    David Uminsky joined the University of Chicago in September 2020 as a senior research associate and Executive Director of Data Science. He was previously an associate professor of Mathematics and Executive Director of the Data Institute at University of San Francisco (USF). His research interests are in machine learning, signal processing, pattern formation, and dynamical systems.  David is an associate editor of the Harvard Data Science Review.  He was selected in 2015 by the National Academy of Sciences as a Kavli Frontiers of Science Fellow. He is also the founding Director of the BS in Data Science at USF and served as Director of the MS in Data Science program from 2014-2019. During the summer of 2018, David served as the Director of Research for the Mathematical Science Research Institute Undergrad Program on the topic of Mathematical Data Science.

    Before joining USF he was a combined NSF and UC President’s Fellow at UCLA, where he was awarded the Chancellor’s Award for outstanding postdoctoral research. He holds a Ph.D. in Mathematics from Boston University and a BS in Mathematics from Harvey Mudd College.

    I was previously a Distinguished Postdoctoral Researcher in the department of statistics at Columbia University, where I worked with the groups of David Blei and Peter Orbanz. I completed my Ph.D. in statistics at the University of Toronto, where I was advised by Daniel Roy. In a previous life, I worked on quantum computing at the University of Waterloo. I won a number of awards, including the Pierre Robillard award for best statistics thesis in Canada.

    I am an assistant professor of Statistics and Data Science at the University of Chicago and a research scientist at Google Cambridge. My recent work revolves around the intersection of machine learning and causal inference, as well as the design and evaluation of safe and credible AI systems. Other noteable areas of interests include network data, and the foundations of learning and statistical inference.

    Rebecca Willett is a Professor of Statistics and Computer Science at the University of Chicago. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005 and was an Assistant then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. She was an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison from 2013 to 2018.  Prof. Willett received the National Science Foundation CAREER Award in 2007, was a member of the DARPA Computer Science Study Group 2007-2011, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. Prof. Willett has also held visiting researcher positions at the Institute for Pure and Applied Mathematics at UCLA in 2004, the University of Wisconsin-Madison 2003-2005, the French National Institute for Research in Computer Science and Control (INRIA) in 2003, and the Applied Science Research and Development Laboratory at GE Medical Systems (now GE Healthcare) in 2002. Her research interests include network and imaging science with applications in medical imaging, wireless sensor networks, astronomy, and social networks. She is also an instructor for FEMMES (Females Excelling More in Math Engineering and Science; news article here) and a local exhibit leader for Sally Ride Festivals. She was a recipient of the National Science Foundation Graduate Research Fellowship, the Rice University Presidential Scholarship, the Society of Women Engineers Caterpillar Scholarship, and the Angier B. Duke Memorial Scholarship.

    Homepage

    Heather Zheng is the Neubauer Professor of Computer Science at University of Chicago. She received my PhD in Electrical and Computer Engineering from University of Maryland, College Park in 1999. Prior to joining University of Chicago in 2017, she spent 6 years in industry labs (Bell-Labs, NJ and Microsoft Research Asia), and 12 years at University of California at Santa Barbara. At UChicago, she co-directs the SAND Lab (Systems, Algorithms, Networking and Data) together with Prof. Ben Y. Zhao.

    Homepage.

    Faculty Mentors

    Anjali Adukia is an assistant professor at the University of Chicago Harris School of Public Policy and the College. In her work, she is interested in understanding how to reduce inequalities such that children from historically disadvantaged backgrounds have equal opportunities to fully develop their potential.  Her research is focused on understanding factors that motivate and shape behavior, preferences, attitudes, and educational decision-making, with a particular focus on early-life influences.  She examines how the provision of basic needs—such as safety, health, justice, and representation—can increase school participation and improve child outcomes in developing contexts.

    Adukia completed her doctoral degree at the Harvard University Graduate School of Education, with an academic focus on the economics of education. Her work has been funded from organizations such as the William T. Grant Foundation, the National Academy of Education, and the Spencer Foundation.  Her dissertation won awards from the Association for Public Policy Analysis and Management (APPAM), Association for Education Finance and Policy (AEFP), and the Comparative and International Education Society (CIES). Adukia received recognition for her teaching from the University of Chicago Feminist Forum.  She completed her masters of education degrees in international education policy and higher education (administration, planning, and social policy) from Harvard University and her bachelor of science degree in molecular and integrative physiology from the University of Illinois at Urbana-Champaign.  She is a faculty research fellow of the National Bureau of Economic Research and a faculty affiliate of the University of Chicago Education Lab and Crime Lab.  She is on the editorial board of Education Finance and Policy.  She was formerly a board member of the Young Nonprofit Professionals Network – San Francisco Bay Area. She continues to work with non-governmental organizations internationally, such as UNICEF and Manav Sadhna in Gujarat, India.

    Chibueze Amanchukwu is a Neubauer Family Assistant Professor at the Pritzker School of Molecular Engineering at the University of Chicago. He received his bachelor’s degree in chemical engineering from Texas A&M University as the department’s Outstanding Graduating Student, and his PhD in chemical engineering from the Massachusetts Institute of Technology.

    As a graduate student with Paula Hammond, he elucidated polymer degradation mechanisms and tuned polymer electrolyte behavior in lithium-air batteries. His graduate work was supported by the National Defense Science and Engineering Graduate (NDSEG) Fellowship, GEM Fellowship, and the Alfred P. Sloan Minority Fellowship. As a postdoctoral fellow with Zhenan Bao at Stanford University, he developed new small molecule electrolytes that decoupled ionic conductivity from electrochemical instability for lithium metal batteries. His postdoctoral work was supported by the TomKat Center Postdoctoral Fellowship in Sustainable Energy at Stanford. His research has been recognized with awards from the American Chemical Society (Excellence in Graduate Polymer Research) and the American Institute of Chemical Engineers (Session’s Best Paper).

    Marshini Chetty is an assistant professor in the Department of Computer Science at the University of Chicago, where she co-directs the Amyoli Internet Research Lab or AIR lab. She specializes in human-computer interaction, usable privacy and security, and ubiquitous computing. Marshini designs, implements, and evaluates technologies to help users manage different aspects of Internet use from privacy and security to performance, and costs. She often works in resource-constrained settings and uses her work to help inform Internet policy. She has a Ph.D. in Human-Centered Computing from Georgia Institute of Technology, USA and a Masters and Bachelors in Computer Science from the University of Cape Town, South Africa. In her former lives, Marshini was on the faculty in the Computer Science Department at Princeton University and the College of Information Studies at the University of Maryland, College Park. Her work has won best paper awards at SOUPS, CHI, and CSCW and has been funded by the National Science Foundation, the National Security Agency, Intel, Microsoft, Facebook, and multiple Google Faculty Research Awards.

    Homepage.

    I received a PhD in physics from the University of Ottawa (2004), and completed a postdoctoral fellowship at the Center for Neural Science at New York University (2017). I was previously a professor of mathematics at the University of Pittsburgh, where I was co-director of the Program in Neural Computation at the Neuroscience Institute at Carnegie Mellon (2007-2020).  I have received several awards, including an Alfred P. Sloan Research Fellowship in Neuroscience, a Vannevar Bush faculty fellowship award, and a Chancellor’s Distinguished Research Award from the University of Pittsburgh.

    My research focuses on a combination of nonlinear dynamics and statistical mechanics, with an emphasis on the genesis and transfer of variability in neural circuits. I have developed core theoretical insights that have made contributions to both neural coding and network learning. Throughout my research career, I have collaborated with experimental colleagues who work in the electrosensory, olfactory, somatosensory, auditory, and visual systems.

    Raul Castro Fernandez is an Assistant Professor of Computer Science at the University of Chicago. In his research he builds systems for discovering, preparing, and processing data. The goal of his research is to understand and exploit the value of data. He often uses techniques from data management, statistics, and machine learning. His main effort these days is on building platforms to support markets of data. This is part of a larger research effort on understanding the Economics of Data. He’s part of ChiData, the data systems research group at The University of Chicago.

    Homepage.

    I am an Assistant Professor of Computer Science at the University of Chicago. I founded and direct 3DL (threedle! ), a group of enthusiastic researchers passionate about 3D, machine learning, and visual computing. I obtained my Ph.D. in 2021 from Tel Aviv University under the supervision of Daniel Cohen-Or and Raja Giryes.

    My research is focused on building artificial intelligence for 3D data, spanning the fields of computer graphics, machine learning, and computer vision. Deep learning, the most popular form of artificial intelligence, has unlocked remarkable success on structured data (such as text, images, and video), and I am interested in harnessing the potential of these techniques to enable effective operation on unstructured 3D geometric data.

    We have developed a convolutional neural network designed specifically for meshes, and also explored how to learn from the internal data within a single shape (for surface reconstructiongeometric texture synthesis, and point cloud consolidation) – and I am interested in broader applications related to these areas. Additional research directions that I am aiming to explore include: intertwining human and machine-based creativity to advance our capabilities in 3D shape modeling and animation; learning with less supervision, for example to extract patterns and relationships from large shape collections; and making 3D neural networks more “interpretable/explainable”.

    I’m an assistant professor at the Department of Statistics at University of Chicago. I am also a member of Committee on Computational and Applied Mathematics (CCAM). I am interested in computational problems in structural biology and quantum many-body physics. I was extremely fortunate to have Amit Singer at Princeton as my Ph.D. adviser during 2012-2016, Lexing Ying at Stanford as my post-doc mentor during 2016-2019, and Phuan Ong at Princeton supervising my master thesis in experimental physics during 2010-2012.

    Sanjay Krishnan is an Assistant Professor of Computer Science. His research group studies the theory and practice of building decision systems that are robust to corrupted, missing, or otherwise uncertain data. His research brings together ideas from statistics/machine learning and database systems. His research group is currently studying systems that can analyze large amounts of video, certifiable accuracy guarantees in partially complete databases, and theoretical lower-bounds for lossy compression in relational databases.

    Homepage.

    My main research interests are in speech and language processing, as well as related aspects of machine learning.

    I am an Associate Professor at TTI-Chicago, a philanthropically endowed academic computer science institute located on the University of Chicago campus. We are recruiting students to our PhD program and visiting student program, as well as additional faculty, including in speech and language-related areas (more on Speech and Language at TTIC).

    I completed my PhD in 2005 at MIT in the Spoken Language Systems group of the Computer Science and Artificial Intelligence Laboratory. In 2005-2007 I was a post-doctoral lecturer in the MIT EECS department. In Feb.-Aug. 2008 I was a Research Assistant Professor at TTI-Chicago.

    Homepage

    David Miller’s research focuses on answering open questions about the fundamental structure of matter. By studying the quarks and gluons -—the particles that comprise everyday protons and neutrons —produced in the energetic collisions of protons at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland, Miller conducts measurements using the ATLAS Detector that will seek out the existence of never-before-seen particles, and characterize the particles and forces that we know of with greater precision. Miller’s work into the properties and measurements of the experimental signatures of these quarks and gluons –or jets” –is an integral piece of the puzzle used in the recent discovery of the Higgs bosons, searches for new massive particles that decay into boosted top quarks, as well as the hints that the elusive quark-gluon-plasma may have finally been observed in collisions of lead ions.

    Besides studying these phenomena, Miller has worked extensively on the construction and operation of the ATLAS detector, including the calorimeter and tracking systems that allow for these detailed measurements. Upgrades to these systems involving colleagues at Argonne National Laboratory, CERN, and elsewhere present an enormous challenge and a significant amount of research over the next several years. Miller is also working with state-of-the art high-speed electronics for quickly deciphering the data collected by the ATLAS detector.

    Miller received his PhD from Stanford University in 2011 and his BA in Physics from the University of Chicago in 2005. He was a McCormick Fellow in the Enrico Fermi Institute from 2011-2013.

    Homepage.

    Brian Nord uses artificial intelligence to search for clues on the origins and development of the universe. He actively works on statistical modeling of strong gravitational lenses, the cosmic microwave background, and galaxy clusters. As leader of the Deep Skies Lab, he brings together experts in computer science and technology to study questions of cosmology, including dark energy, dark matter, and the early universe, through large-scale data analysis.

    Nord has authored or co-authored nearly 50 papers. He trains scientists in public communication, advocates for science education and funding, and works to develop equitable and just research environments. As co-leader of education and public engagement at the Kavli Institute for Cosmological Physics at UChicago, he organizes Space Explorers, a program to help underrepresented minorities in high school engage in hands-on physics experiences outside the classroom. He is an associate scientist at Fermi National Accelerator Laboratory, where he is a member of the Machine Intelligence Group.

    Homepage.

    Samantha Riesenfeld is an Assistant Professor of Molecular Engineering and of Genetic Medicine, a member of the Committee on Immunology, an Associate Member of the Comprehensive Cancer Center, and co-director of the new Computational and Systems Immunology PhD track in Immunology and Molecular Engineering. She leads an interdisciplinary research program focused on developing and applying genomics-based machine learning approaches to investigate the cellular components, transcriptional circuitry, and dynamics underlying complex biological systems, with a special interest in inflammatory immune responses and solid tumor cancer.

    I am a Professor at Toyota Technological Institute ad Chicago (TTIC), a philanthropically endowed academic computer science institute located on the University of Chicago campus. I also hold a part-time faculty appointment at the University of Chicago Department of Computer Science. Prior to coming to TTIC, I was a post-doctoral researcher at the Department of Computer Science of Brown University, working with Michael Black. I received my PhD degree at MIT where I worked at CSAIL with Trevor Darrell. I obtained my MSc degree at the Computer Science Department of the Technion, Israel Institute of Technology in Haifa, Israel, and my undergraduate degree in Math and CS from Hebrew University in Jerusalem, Israel.

    My CV: PDF

    My research interests include:

    • Image understanding, including standard tasks like object detection and panoptic segmentation, and novel definitions of scene parsing and understanding
    • Perception of 3D world from images and videos
    • Vision and language, in particular purposeful/informative image descriptions
    • Synthesis and perception of non-photorealistic imagery
    • Automatic processing and recognition of sign language
    • Machine learning: example-base methods, un-, self- and semi-supervised learning of representations

    Samuel L. Volchenboum, MD, PhD, MS, is an expert in pediatric cancers and blood disorders. He has a special interest in treating children with neuroblastoma, a tumor of the sympathetic nervous system.

    In addition to caring for patients, Dr. Volchenboum studies ways to harness computers to enable research and foster innovation using large data sets. He directs the development of the International Neuroblastoma Risk Group Database project, which connects international patient data with external information such as genomic data and tissue availability. The Center he runs provides computational support for the Biological Sciences Division at the University of Chicago, including high-performance computing, applications development, bioinformatics, and access to the clinical research data warehouse.

    Homepage

  • Application

    Application Timeline

    • Student Application Deadline: September 30th, 2021 – 11:59pm CT
    • Faculty Nomination Deadline: October 4th, 2021 – 11:59pm CT
    • Notification Deadline: Week of October 11th, 2021
    • Workshop: November 11-12th, 2021

    Application Requirements

    The application is available through InfoReady. If you have not previously used InfoReady, you will be required to create an account in order to submit your application.

    • Resume/CV
    • Biography (100 words)
    • Research talk title
    • Research talk abstract (250 words)
    • Research keywords
    • Research statement outlining research goals, potential projects of interest, and long-term career goals (2 pg, standard font at a size 11 or larger)
      • References for the research statement may go onto an additional third page.
    • Nomination form indicating support from at least 1, up to 2 faculty member(s) (form available on the application, linked above/below)

    Eligibility & Guidelines:

    If you have any questions about your eligibility, please feel free to send an email to cdac@uchicago.edu.

    • Applicants must be full time graduate students within ~1-2 years of obtaining a PhD, or a current postdoctoral scholar, fellow, or researcher.
    • We welcome applicants from a wide variety of fields and backgrounds: any eligible PhD or postdoc who is engaging in rigorous, data-driven inquiry is encouraged to apply.
    • Applicants both from and outside of the University of Chicago are encouraged to apply.
    • Applicants may only submit one application.
    • Applicants may have nominations from a maximum of 2 faculty members or advisors.

    Review Criteria

    Proposals will be reviewed by the Rising Stars in Data Science Committee using the following scoring rubric (0-3 points per criterion):

    • Research Potential: Overall potential for research excellence, demonstrated by research statement, goals and long-term career goals.
    • Academic Progress: Academic progress to date, as evidenced by publications and endorsements from their faculty advisor or nominator.
    • Impact: Approaches, methods or theory that advances research innovation in interdisciplinary or foundational approaches in data science, or real world challenges.
    • Background in Data Science Fundamentals: Experience or coursework in computer science, statistics, data science, AI or a related field.

    Due to the volume of applications we receive, we will be unable to provide reviewer feedback on applications that are not accepted.

    Apply Here

  • FAQ

    Any questions regarding the application or the workshop can be directed to cdac@uchicago.edu.

    • Do I have to be in a data science program (PhD or postdoc) to apply?

      We do not require that applicants are currently in a Data Science program to apply. However, applicants should be pursuing doctoral degrees or postdocs in computer science, statistics, computational and applied math, data science, or a related computational field.

    • Who can apply?

      Applicants must be full time graduate students within ~1-2 years of obtaining a PhD, or a current postdoctoral scholar, fellow, or researcher. If you have any questions about your eligibility, please feel free to send an email to cdac@uchicago.edu.

    • Will the Autumn 2021 workshop be remote or in person?

      We are committed to providing a safe and healthy environment for UChicago Rising Stars attendees, and will abide by recommendations made by UChicago Medicine and the University of Chicago leadership regarding the impact of the ongoing coronavirus pandemic on travel and in-person events. The structure of the workshop may change based on regulations put in place to ensure everyone’s health and safety, but we remain committed to creating an event that is engaging, supportive, and informative for all participants. We will immediately inform workshop participants if alternative plans are necessary, and will update all communication channels, including the program website.

      This convening is open to all invitees regardless of vaccination status and, because of ongoing health risks, particularly to the unvaccinated, participants are expected to adopt the risk mitigation measures (masking and social distancing, etc.) appropriate to their vaccination status as advised by public health officials or to their individual vulnerabilities as advised by a medical
      professional. Public convening may not be safe for all and carries a risk for contracting COVID-19, particularly for those unvaccinated. Participants will not know the vaccination status of others and should follow appropriate risk mitigation measures.

    • When will the workshop agenda be available?

      The workshop agenda will be available in October.

    • Who is the audience for the student research talks?

      The primary audience for the student research talks are the Rising Stars Committee and PhD students, researchers, faculty, and staff in the data science ecosystem at UChicago. Certain panels and workshop events will be open to the public. Registration for the event will open in October.

    • What should the research talk cover?

      The research talk should highlight your research interests in data science and computing, and ideally showcase your unique approaches to the nascent field as it takes shape.

    • Do I need to cover established research in my talk, or can it cover early-stage projects?

      Your talk can use research work that is either early-stage or published, so long as you are confident that it appropriately and best showcases your research methodology and approaches.