Foundations for Automated Data Science
Data science, despite its clear value, still has not received satisfactory formal treatment as a discipline. Many regard data science as a pragmatic black art, in large part due to the fact that data preparation, model deployment, and many practical model desiderata beyond simple predictive accuracy are generally not treated in courses or textbooks on statistics or machine learning in the sense of being guided by any rigorous underlying principles. This has resulted, for example, in much data science education focusing on the ability to use a collection of specific tools. It has also resulted in the widespread occurrence of subtle but significant conceptual errors being made in practice, even by PhDs in major institutions. In this talk I will present a mathematical model of data science that can clarify and guide the aforementioned important pragmatic aspects of data science rather than simply ascribing best practice to heuristics, general experience, or domain knowledge. I will discuss open practical issues in data science, including learnings from extensive user studies, show how such a theoretical foundation can address them, and finally show how these principles can translate to new practical data science tools in the form of the user experience, both graphical and programmatic in the form of libraries/languages.
Friday, October 25, 2019
Welcome & Introductions
Alexander Gray serves as VP of Foundations of AI at IBM, leading IBM’s basic AI research globally. He previously served as CEO and CTO of Skytree, which he co-founded, then at Infosys as GM of Research and Fellow. Prior to that, he served as a tenured Associate Professor at the Georgia Institute of Technology. A theme of his research work, beginning at NASA in 1993, has been on the computational aspects of machine learning for handling massive datasets, long predating the movement of “big data” in industry. His work helped enable the Science journal’s Top Breakthrough of 2003, and have won a number of research awards. He served as a member of the 2010 National Academy of Sciences Committee on the Analysis of Massive Data, a National Academy of Sciences Kavli Scholar, and a frequent advisor and speaker on topics of large-scale machine learning and data science at top research conferences, government agencies, and leading corporations. He received AB degrees in Applied Mathematics and Computer Science from UC Berkeley and a PhD in Computer Science from Carnegie Mellon University. His current interests are in automated data science, automated programming, and in new formalisms for AI beyond today’s machine learning, toward achieving reading comprehension and strong AI.