Unifying Large Scale Data Preprocessing and ML Pipelines with Ray Datasets, Py Data Global 2021
Unifying Large Scale Data Preprocessing and Machine Learning Pipelines with Ray Datasets Speakers: Alex Wu, Clark Zinzow Summary ML tasks such as distributed training and batch inference stretch the abstractions of modern data processing systems, leading to performance or learning efficiency tradeoffs. In this talk we introduce Ray Dataset, a universal compatibility layer built on Arrow and Python that allows data processing to be combined with ML pipelines without such tradeoffs. Alex Wu s Bio Clark Zinzow s Bio PyData Global 2021 Website: LinkedIn: Twitter: PyData is an educational program of NumFOCUS, a 501(c)3 nonprofit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyD
|
|