1/28/21 #10 Travis Addair - Deep Learning at Scale with Horovod

2/23/2022

Stanford MLSys Seminar

0:00

59:52

Travis Addair - Horovod and the Evolution of Deep Learning at Scale

Deep neural networks are pushing the state of the art in numerous machine learning research domains; from computer vision, to natural language processing, and even tabular business data. However, scaling such models to train efficiently on large datasets imposes a unique set of challenges that traditional batch data processing systems were not designed to solve. Horovod is an open source framework that scales models written in TensorFlow, PyTorch, and MXNet to train seamlessly on hundreds of GPUs in parallel. In this talk, we'll explain the concepts and unique constraints that led to the development of Horovod at Uber, and discuss how the latest trends in deep learning research are informing the future direction of the project within the Linux Foundation. We'll explore how Horovod fits into production ML workflows in industry, and how tools like Spark and Ray can combine with Horovod to make productionizing deep learning at scale on remote data centers as simple as running locally on your laptop. Finally, we'll share some thoughts on what's next for large scale deep learning, including new distributed training architectures and how the larger ecosystem of production ML tooling is evolving.

More episodes from "Stanford MLSys Seminar"

More Episodes

Get the whole world of podcasts with the free GetPodcast app.

Subscribe to your favorite podcasts, listen to episodes offline and get thrilling recommendations.

1/28/21 #10 Travis Addair - Deep Learning at Scale with Horovod

Stanford MLSys Seminar

More episodes from "Stanford MLSys Seminar"

#62 Dan Fu - Improving Transfer and Robustness of Supervised Contrastive Learning

#61 Kexin Rong - Big Data Analytics

#60 Igor Markov - Looper: An End-to-End ML Platform for Product Decisions

#59 Zhuohan Li - Alpa: Automated Model-Parallel Deep Learning

3/10/22 #58 Shruti Bhosale - Multilingual Machine Translation

3/3/22 #57 Vijay Janapa Reddi - TinyML, Harvard Style

2/24/22 #56 Fait Poms - Interactive Model Development

1/28/21 #10 Travis Addair - Deep Learning at Scale with Horovod

2/17/22 #55 Doris Lee - Visualization for Data Science

1/21/21 #9 Song Han - Reducing AI's Carbon Footprint