The rise of Big Data has led to new demand for Machine Learning (ML) systems to learn complex models often with millions to billions of parameters that promise adequate capacity to analyze massive datasets and offer predicative functions thereupon. For example, in many modern applications such as web-scale content extraction via topic models, genome-wide association mapping via sparse structured regression, and image understanding via deep neural networks, one needs to handle BIG ML problems that threaten to exceed the limit of current architectures and algorithms. In this tutorial, we present a systematic overview of modern scalable ML approaches for such applications --- the insights and challenges of designing scalable and parallelizable algorithms for working with Big Data and Big Model; the principles and architectures of building distributed systems for executing these models and algorithms; and the theory and analysis necessary for understanding the behaviors and providing guarantees of these models, algorithms, and systems.
We present a comprehensive, principled, yet highly unified and application-grounded view of the fundamentals and strategies underlying a wide range of modern ML programs practiced in industry and academia, beginning with introducing the basic algorithmic roadmaps of both optimization-theoretic and probabilistic-inference methods --- two major workhorse algorithmic engines that power nearly all ML programs, and the technical developments therein aiming at large scales built on algorithmic acceleration, stochastic approximation, and parallelization. We then turn to the challenges such algorithms must face in a practical distributed computing environment due to memory/storage limit, communication bottleneck, resource contention, straggler, etc., and review and discuss various modern parallelization strategies and distributed frameworks that can actually run these algorithms at Big Data and Big Model scales, while also exposing the theoretical insights that make such systems and strategies possible. We focus on what makes ML algorithms peculiar, and how this can lead to algorithmic and systems designs that are markedly different from today’s Big Data platforms. We discuss such new opportunities in algorithm, system, and theory on parallel machine learning, in real (instead of ideal) distributed communication, storage, and computing environments.
Download the slides here.
Eric Xing is a Professor of Machine Learning in the School of Computer Science at Carnegie Mellon University, and Director of the CMU/UPMC Center for Machine Learning and Health. His principal research interests lie in the development of machine learning and statistical methodology, and large-scale computational system and architecture; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in artificial, biological, and social systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. He servers (or served) as an associate editor of the Annals of Applied Statistics (AOAS), the Journal of American Statistical Association (JASA), the IEEE Transaction of Pattern Analysis and Machine Intelligence (PAMI), the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning Journal (MLJ), the Journal of Machine Learning Research (JMLR). He was a member of the DARPA Information Science and Technology (ISAT) Advisory Group, a recipient of the NSF Career Award, the Sloan Fellowship, the United States Air Force Young Investigator Award, and the IBM Open Collaborative Research Award. He is the Program Chair of ICML 2014.
Qirong Ho is a scientist at the Institute for Infocomm Research, A*STAR, Singapore, and an adjunct assistant professor at the Singapore Management University School of Information Systems. His primary research focus is distributed cluster software systems for Machine Learning at Big Data scales, with a view towards correctness and performance guarantees. In addition, Dr. Ho has performed research on statistical models for large-scale network analysis --- particularly latent space models for visualization, community detection, user personalization and interest prediction --- as well as social media analysis on hyperlinked documents with text and network data. Dr. Ho received his PhD in 2014, under Eric P. Xing at Carnegie Mellon University's Machine Learning Department. He is a recipient of the 2015 KDD Dissertation Award (runner-up), and the Singapore A*STAR National Science Search Undergraduate and PhD fellowships.