Skip to main content
  • Overview

    OVERVIEW

    Collaborators from UChicago, Argonne National Laboratory, and Fermi National Accelerator Laboratory have developed a bridge workshop aimed at bridging the transition from introductory computer science classes to data science research. The weeklong, virtual workshop will introduce students to the data science research lifecycle, essential computational skills needed for data analysis and visualization, and provide training on how to communicate their findings. The workshop will focus on creating a continuous learning environment from students’ structured classroom studies to more experimental, inquiry-driven research work in small groups. Read more about the March 2021 inaugural workshop here.

    MOTIVATION

    In most Chicago Public High Schools (CPS), students’ exposure to computer science is limited to one computer science survey course, with limited exposure to advanced topics such as artificial intelligence, data science, or application of computer science to societal issues. This lack of opportunity continues to be perpetuated when students seek internships and other employment experiences and do not have the confidence in their own knowledge to see computer science, data science or artificial intelligence as a possible career pathway for themselves. To address this need, a team of researchers and educators from Argonne, UChicago’s Center for Data and Computing and Fermilab, are developing a data science bridge workshop that supports students from Chicago’s southside community to develop a deeper understanding of data science and grow a tangible skill set that is grounded in scientific projects, real-world datasets, and professional tools. Through this program students will explore the foundational concepts of computer science and data science, working with authentic and complex datasets and leveraging principles of AI to gain insights from data and make predictions. 

    WORKSHOP FOCUS & APPROACH

    The workshop will be taught by using AI+Science case studies that contain a real-world scientific challenge (i.e. COVID-19), an authentic data set and associated professional tools. The case studies will be supplied from data generated by scientific research projects from the three institutions. Using the Python language, students will explore data structures with an emphasis on multidimensional arrays, manipulating and visualizing them with commonly used libraries in scientific computing such as NumPy, Pandas and Matplotlib. The datasets will provide students with many of the challenges associated with scientific data and provide them with the skills to perform statistical analysis and prediction all the while exploring real-world problems. 

    The workshop will model the collaborative nature of computer science, by situating students in teams with guidance and support of staff, including undergraduate and graduate mentors from the three institutions.

    Funding for the workshop is supported by a grant awarded by UChicago’s Office of Research and National Laboratories Joint Task Force Initiative.

  • Data4All Team

    Workshop Committee

    Meridith Bruozas is the Manager of Educational Programs and Outreach, with a focus on developing high-quality educational programs that are aligned to mission science and support the development of the next generation of scientists and engineers. She leads a highly effective team in developing and implementing on- and off-site programming that ranges from inspiring and connecting middle school youth to STEM experiences to providing undergraduate and graduate students once-in-a-lifetime internship experiences. She also focuses on developing key partnerships with local and regional organizations and school systems around STEM education. Her goal is to create high-quality connected programming that provides equitable access to all students interested in pursuing STEM from middle to graduate level.

    Ms. Bruozas is an educator by training and has spent the past 20 years in a combination of district, nonprofit and academic leadership positions researching and promoting STEM education. She earned her M.S. in Learning Sciences from Northwestern University and a B.S. in Secondary Education and Biology from Ball State University. Her publications include several national science curricula for middle and high school classrooms and numerous professional presentations and research articles.

    Ms. Bruozas is a member of the National Association for Research in Science Teaching, National Science Teacher Association, and International Society of Technology in Education.

    Kyle Chard is a Research Assistant Professor in the Department of Computer Science at the University of Chicago and Argonne National Laboratory. He has been Program Director of the Data & Computing Summer Lab since its first iteration under CDAC in 2019, and previously oversaw the Summer Internship Program ran by the former Computation Institute.

    He received his Ph.D. in Computer Science from Victoria University of Wellington in 2011. He co-leads the Globus Labs research group which focuses on a broad range of research problems in data-intensive computing and research data management. He currently leads projects related to parallel programming in Python, scientific reproducibility, and elastic and cost-aware use of cloud infrastructure.

    John Domyancich is Education Programs and Outreach’s Learning Center Lead, where he leads a team of Educators that focus on engaging middle through high school students in scientific inquiry, creating immersive experiences that highlight the work and mission of Argonne research. John plans and orchestrates Argonne’s summer camp and high school research programs, and he is also responsible for the Learning Lab field trips. His goal is to inspire and guide the next generation of Argonne scientists through STEM pathways.

    John taught high school science for 11 years before joining Argonne in 2015.  During his teaching career, he created student-centered classrooms and developed STEM curricula to integrate collaboration, technology and conceptual model development into the learning experience. To prepare himself for a career teaching science, he earned an M.A. in Secondary Education from Western Illinois University as well as a B.S. in Chemistry from the University of Iowa.

    John is a member of the American Modeling Teachers Association and the Illinois Association of Chemistry Teachers. He is also active in the National Math and Science Initiative and the College Board. Outside work, he likes to run and spend time with his wife and three daughters.

    Julia Lane is the Executive Director of the Center for Data and Computing, responsible for shaping and executing the strategic vision of CDAC, building new research partnerships and outreach strategies to foster interdisciplinary collaborations, and ensuring that the University continues to broaden applications of data science and computing approaches.

    Brian Nord uses artificial intelligence to search for clues on the origins and development of the universe. He actively works on statistical modeling of strong gravitational lenses, the cosmic microwave background, and galaxy clusters. As leader of the Deep Skies Lab, he brings together experts in computer science and technology to study questions of cosmology, including dark energy, dark matter, and the early universe, through large-scale data analysis.

    Nord has authored or co-authored nearly 50 papers. He trains scientists in public communication, advocates for science education and funding, and works to develop equitable and just research environments. As co-leader of education and public engagement at the Kavli Institute for Cosmological Physics at UChicago, he organizes Space Explorers, a program to help underrepresented minorities in high school engage in hands-on physics experiences outside the classroom. He is an associate scientist at Fermi National Accelerator Laboratory, where he is a member of the Machine Intelligence Group.

    Homepage.

    Michael E. Papka is a senior scientist at Argonne National Laboratory, where he is also deputy associate laboratory director for Computing, Environment and Life Sciences (CELS) and division director of the Argonne Leadership Computing Facility (ALCF). Both his laboratory leadership roles and his research interests relate to high-performance computing in support of scientific discovery.

    Within the CELS directorate, Mike supports programmatic efforts spanning Argonne that contribute to, or benefit from, advanced computing. At ALCF, he oversees a U.S. Department of Energy user facility that houses two of the world’s fastest supercomputers and enables the research community to pursue major discoveries and innovations through open science.

    In addition to his duties at Argonne, Mike is a professor of computer science at Northern Illinois University (NIU), where he teaches foundational concepts of computer science and advanced topics in data analytics and data science. He is also founder and co-director of NIU’s Data, Devices, and Interaction Laboratory (ddiLab), a collaborative workspace for undergraduate and graduate students to conduct computer science research with an emphasis on visualization and data analysis coupled to high-performance computing.

    Mike has a B.S. in physics from Northern Illinois University, an M.S. in computer science and electrical engineering from the University of Illinois at Chicago, and an M.S. and a Ph.D. in computer science from the University of Chicago.

    Katie Rosengarten is Program Manager at the Center for Data and Computing, responsible for overseeing strategic partnerships, management, execution, and evaluation of student research engagement opportunities for early high school learners through PhD students.

    Curriculum Development Group

    John Domyancich is Education Programs and Outreach’s Learning Center Lead, where he leads a team of Educators that focus on engaging middle through high school students in scientific inquiry, creating immersive experiences that highlight the work and mission of Argonne research. John plans and orchestrates Argonne’s summer camp and high school research programs, and he is also responsible for the Learning Lab field trips. His goal is to inspire and guide the next generation of Argonne scientists through STEM pathways.

    John taught high school science for 11 years before joining Argonne in 2015.  During his teaching career, he created student-centered classrooms and developed STEM curricula to integrate collaboration, technology and conceptual model development into the learning experience. To prepare himself for a career teaching science, he earned an M.A. in Secondary Education from Western Illinois University as well as a B.S. in Chemistry from the University of Iowa.

    John is a member of the American Modeling Teachers Association and the Illinois Association of Chemistry Teachers. He is also active in the National Math and Science Initiative and the College Board. Outside work, he likes to run and spend time with his wife and three daughters.

    Julia Lane is the Executive Director of the Center for Data and Computing, responsible for shaping and executing the strategic vision of CDAC, building new research partnerships and outreach strategies to foster interdisciplinary collaborations, and ensuring that the University continues to broaden applications of data science and computing approaches.

    Jesse London is a software engineer at the Center for Data and Computing, where he contributes to the CDAC Open-Source Initiative.

    https://github.com/jesteria

    Julia Koschinsky is the Executive Director of the Center for Spatial Data Science at the University of Chicago and has been part of the GeoDa team for over 16 years. She has been conducting and managing research funded through federal awards of over $8 million to gain insights from the spatial dimensions of urban challenges in housing, health, and the built environment.

    Katie Rosengarten is Program Manager at the Center for Data and Computing, responsible for overseeing strategic partnerships, management, execution, and evaluation of student research engagement opportunities for early high school learners through PhD students.

    Bio: Tyler is a Ph.D. candidate in Computer Science at the University of Chicago, advised by Kyle Chard and Ian Foster. His research interests lie at the intersection of data management, data science, and HPC, focusing on enabling scientists to maximize the utility of massive amounts of data. His work has culminated in the design of the open-source system Xtract that can intelligently formulate metadata extraction workflows for data stored in heterogeneous file formats across leadership-scale computing facilities. Before joining the University of Chicago, he received his B.A. in Applied Mathematics and Statistics from Macalester College.

    Talk Title: Enabling Data Utility Across the Sciences

    Talk Abstract: Scientific data repositories are generally chaotic—files spanning heterogeneous domains, studies, and users are stuffed into an increasingly-unsearchable data swamp without regard for organization, discoverability, or usability. Files that could contribute to scientists’ future research may be spread across storage facilities and submerged beneath petabytes of other files, rendering manual annotation and navigation virtually impossible. To remedy this lack of navigability, scientists require a rich search index of metadata, or data about data, extracted from individual files. In this talk, we will explore automated metadata extraction workflows for converting dark data swamps into navigable data collections, given no prior knowledge regarding each file’s schema or provenance. I enable such extraction from files of vastly different structures by building a robust suite of “extractors” that leverage data scientific methods (e.g., keyword analysis, entity recognition, and file type identification) in order to maximize our body of knowledge about a diversity of files.

    In this talk, I outline the construction, optimization, and evaluation of Xtract—a scalable metadata extraction system—that automatically constructs extraction plans for files distributed across remote cyberinfrastructure. I illustrate the scale challenges in processing these data, and outline techniques to maximize extraction throughput, by analyzing Xtract’s performance on four real science data sets. Finally, I will present early results of a user study in which I directly evaluate the extent to which automatically extracted metadata enables data utility in scientific research processes.

  • Instructor Team

    Mentors

    Azucena Rodriguez is a Learning Center Instructor and is responsible for facilitating on-site activities that enable middle and high school students to explore Argonne’s unique culture of innovation and collaboration in the classroom. She also works with the Department of Energy’s Visiting Faculty Program to increase the research competitiveness of faculty members and students at institutions historically underrepresented in the research community. Her goal is to motivate the next generation of problem solvers to incorporate science into their daily lives.

    Dr. Rodriguez participated in the Northwestern Teaching Certificate Program and has a PhD in Bioengineering as well as a BS in Mechanical Engineering from the University of California. Prior to Argonne, she held numerous university instructor positions focused on leading learning activities for undergraduate and graduate students, and she has an extensive portfolio of outreach and informal learning experience.

    Azucena holds memberships in the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) and the Latino Association of Graduate Students in Science and Engineering (LAGSES). Outside of work, she enjoys hiking, visiting national parks, and exploring the outdoors with her family.

    Isabella DeClue is a sophomore at the University of Chicago majoring in Statistics and minoring in Computer Science. She is an alumna of the CDAC Data & Computing Summer Lab 2021 Program, where she worked on a project with Prof. Lorenzo Orecchia called “Learning Manifolds From Point Clouds.” After the CDAC Summer Lab, she was awarded a College Research Fellowship to continue her work on a new implementation of the Moving Least Squares algorithm with the Orecchia research group. She is also the Data Analytics Chair of compileHer, a UChicago student organization that aims to engage young girls from all across the city of Chicago through computer science education, and that is committed to closing the gender gap in the tech world by providing free, high-quality computer science education to young women.

    Bio: Tyler is a Ph.D. candidate in Computer Science at the University of Chicago, advised by Kyle Chard and Ian Foster. His research interests lie at the intersection of data management, data science, and HPC, focusing on enabling scientists to maximize the utility of massive amounts of data. His work has culminated in the design of the open-source system Xtract that can intelligently formulate metadata extraction workflows for data stored in heterogeneous file formats across leadership-scale computing facilities. Before joining the University of Chicago, he received his B.A. in Applied Mathematics and Statistics from Macalester College.

    Talk Title: Enabling Data Utility Across the Sciences

    Talk Abstract: Scientific data repositories are generally chaotic—files spanning heterogeneous domains, studies, and users are stuffed into an increasingly-unsearchable data swamp without regard for organization, discoverability, or usability. Files that could contribute to scientists’ future research may be spread across storage facilities and submerged beneath petabytes of other files, rendering manual annotation and navigation virtually impossible. To remedy this lack of navigability, scientists require a rich search index of metadata, or data about data, extracted from individual files. In this talk, we will explore automated metadata extraction workflows for converting dark data swamps into navigable data collections, given no prior knowledge regarding each file’s schema or provenance. I enable such extraction from files of vastly different structures by building a robust suite of “extractors” that leverage data scientific methods (e.g., keyword analysis, entity recognition, and file type identification) in order to maximize our body of knowledge about a diversity of files.

    In this talk, I outline the construction, optimization, and evaluation of Xtract—a scalable metadata extraction system—that automatically constructs extraction plans for files distributed across remote cyberinfrastructure. I illustrate the scale challenges in processing these data, and outline techniques to maximize extraction throughput, by analyzing Xtract’s performance on four real science data sets. Finally, I will present early results of a user study in which I directly evaluate the extent to which automatically extracted metadata enables data utility in scientific research processes.

    Kelly Sturner is a Learning Lab Instructor at Argonne’s Learning Center, where she helps middle and high school students envision potential futures in STEM fields. Each year, she helps thousands of students connect with science concepts and professionals firsthand through Learning Lab field trips as well as the Creative Approaches to Problems in Science (CAPS) High School Computing Workshop and CodeGirls@Argonne summer programs. Kelly wants to help students see science as a powerful tool to understand the world with life-changing experiences.

    Kelly has a strong background in environmental science, and she holds an M.S. in Science Education from the University of Tennessee-Knoxville as well as an M.S. in Soil Science from the University of California-Davis. She first worked at Argonne as an undergraduate student and later returned to Argonne after graduate school. Kelly now leads programs translating cutting-edge science for K-12 classrooms. She remains deeply passionate about science education and outreach, having volunteered outside of work and authored a number of student curricula.

    Kelly is a member of the National Science Teachers Association (NSTA) and the International Society for Technology in Education (ISTE).

    Volunteers

    Robin Lambert Graham is a forest ecosystem ecologist with expertise in biomass resource availability for bioenergy and climate change. She is currently overseeing Argonne’s climate change and biological research for the Department of Energy.

    Prior to joining Argonne, she spent 25 years at Oak Ridge National Laboratory in Oak Ridge Tennessee where she managed the Oak Ridge Bioenergy Research program, served in multiple leadership positions and conducted research for TVAUSAIDNASA, and DOE. For a decade she was the secretary for the Association of Ecosystem Research Centers. Early in her career she did forestry research for Weyerhaeuser Co. She has always enjoyed the intersection of research and application and has published extensively with economists and those outside her academic domain.

    Jacob Leppek is a graduating MSc. student at UChicago with the Computational Analysis and Public Policy program. His professional focus is on closing the technology gap within the nonprofit sector.

    Caroline Kinnen is a first year student in the Master of Science in Computational Analysis and Public Policy (MSCAPP) program at the Harris School of Public Policy at the University of Chicago. She is studying computer science, statistics, and data management and engineering practices with application to public policy and civic tech.

    Jacquie Otmanski is a Learning Center Instructor for Argonne Educational Programs, and she prepares and facilitates key student programs at the Learning Center such as summer camps and Learning Labs. She helps create immersive youth experiences such as the All About Energy pre-internship program, which expose students firsthand to STEM topics as well as connect students to professionals. In addition, she develops at-home STEM activities for students, which are published in Education’s seasonal newsletter. Jacquie is passionate about mentoring diverse youth throughout their academic journeys, so that they can grow on academic and personal levels, develop critical STEM skills, and realize and strengthen their STEM identities.

    Over the course of her academic studies, Jacquie shifted toward merging business with engineering, covering a wide range of agricultural and environmental sciences, and she graduated from the University of Illinois with a B.S. in Technical Systems Management. She also recently completed her M.A. in Curriculum and Instruction at Concordia University of Chicago. Due to a strong desire to teach, she taught 4th6th7th, and 8th-grade science as a Teach for America Core member before joining Argonne. Prior to her current position, Jacquie was part of Education’s Outreach Team, where she developed and coordinated outreach initiatives and competitions for students, schools, and scientists/engineers as an Outreach Instructor.

    Jacquie is a member of the National Science Teachers Association (NSTA) and the Society of Hispanic Professional Engineers (SHPE).

  • Workshop Speakers

    Workshop Speakers

    Dylan Halpern is the Principal Software Engineer for the US Covid Atlas at the Center. Utilizing methods of geospatial data analytics, visualization, and web development, he works in domains of public health, urban experience and activity, and transit. He holds a Master in City Planning from MIT, and previous positions include research roles with MIT Senseable City Lab, Civic Data Design Lab, and City Form Lab, and a Fulbright research fellowship in Brazil.

    Charles Macal applies computational modeling and simulation tools to complex systems to solve problems in a variety of fields, including energy and national security.

    He is the chief scientist for the Argonne Resilient Infrastructure Initiative, and is a principal investigator for the development of the widely used Repast agent-based modeling toolkit.

    He has Appointments at the University of Chicago Computation Institute and the Northwestern-Argonne Institute for Science and Engineering. He is adjunct professor at the University of Chicago, where he teaches a course on Complex Adaptive Systems for Threat Management and Emergency Preparedness.

    He is a registered professional engineer in the State of Illinois and holds software copyrights for two systems: ELIST (Enhanced Logistics Intra-theater Support Tool) and EMCAS (Electricity Market Complex Adaptive System).

    Brandon McCallister is an Assistant Director of Admissions at the University of Chicago where he also graduated from in 2018 with a double bachelor’s in Comparative Human Development and Theater and Performance Studies. He works as the Director of the UChicago Promise Program which aims to expand college equity and access to students living the city of Chicago. Brandon is a Chicago native himself and spends his days looking for the best food in the city or staying home and playing with his cat.

    Brian Nord uses artificial intelligence to search for clues on the origins and development of the universe. He actively works on statistical modeling of strong gravitational lenses, the cosmic microwave background, and galaxy clusters. As leader of the Deep Skies Lab, he brings together experts in computer science and technology to study questions of cosmology, including dark energy, dark matter, and the early universe, through large-scale data analysis.

    Nord has authored or co-authored nearly 50 papers. He trains scientists in public communication, advocates for science education and funding, and works to develop equitable and just research environments. As co-leader of education and public engagement at the Kavli Institute for Cosmological Physics at UChicago, he organizes Space Explorers, a program to help underrepresented minorities in high school engage in hands-on physics experiences outside the classroom. He is an associate scientist at Fermi National Accelerator Laboratory, where he is a member of the Machine Intelligence Group.

    Homepage.

    Ashlyn Sparrow is an independent game designer. Her work focuses on creating socially impactful games and health focused app interventions. In 2013, Ashlyn was the Learning Technology Director of the Game Changer Chicago Design Lab at the University of Chicago, devoted to creating game-based health interventions supported by funding from the National Institutes of Health and the National Science Foundation. During her tenure she designed and led the production of The SourceS.E.E.DHexacago Health AcademyBystander, and Prognosis

    In 2018, She became the Assistant Director of the Weston Game Lab (WGL) at the Media Arts, Data, and Design (MADD) Center at the University of Chicago where she teaches undergraduate, graduate and K-12 students how to design their own games while uncovering the sociopolitical implications of their designs. Through WGL, she has developed a series of alternate reality games including Indiecade award-winning game TerrariumA Labyrinth, and EchoIn addition to her work at WGL, she works as a game designer and programmer in Chicago, having worked on Oni Fighter Yasuke for Waking Oni Games.

    Website: ashlynsparrow.com

    David Uminsky joined the University of Chicago in September 2020 as a senior research associate and Executive Director of Data Science. He was previously an associate professor of Mathematics and Executive Director of the Data Institute at University of San Francisco (USF). His research interests are in machine learning, signal processing, pattern formation, and dynamical systems.  David is an associate editor of the Harvard Data Science Review.  He was selected in 2015 by the National Academy of Sciences as a Kavli Frontiers of Science Fellow. He is also the founding Director of the BS in Data Science at USF and served as Director of the MS in Data Science program from 2014-2019. During the summer of 2018, David served as the Director of Research for the Mathematical Science Research Institute Undergrad Program on the topic of Mathematical Data Science.

    Before joining USF he was a combined NSF and UC President’s Fellow at UCLA, where he was awarded the Chancellor’s Award for outstanding postdoctoral research. He holds a Ph.D. in Mathematics from Boston University and a BS in Mathematics from Harvey Mudd College.