Accelerating ML Inference at Scale with ONNX, Triton and Seldon, Py Data Global 2021

Views: 2

Accelerating ML Inference at Scale with ONNX, Triton and Seldon Speaker: Alejandro Saucedo Summary Identifying the right tools for high performant production machine learning may be overwhelming as the ecosystem continues to grow at breakneck speed. In this session showcase how practitioners can optimize productionise ML models in scalable ecosystems without having to deal with the underlying infrastructure. We ll be optimizing a GPT2 with ONNX and deploying to Triton using Seldon Tempo. Description Identifying the right tools for high performant production machine learning may be overwhelming as the ecosystem continues to grow at breakneck speed. In this session we aim to provide a handson guide on how practitioners can productionise optimized machine learning models in scalable ecosystems using productionready open source tools frameworks. We will dive into a practical usecase, deploying the renowned GPT2 NLP machine learning model using the Tempo SDK, whi