GPU Optimized Deep Learning Networks for Automatic Speech Recognition

Views: 4

Jessica Ray (MIT Lincoln Laboratory) 2014 In this talk, we compare the implementation of deep learning networks 1 on traditional x86 processors with the implementation on NVIDIA Tesla K20 GPU Accelerators for the purposes of training Restricted Boltzmann Machines 2 and for deep network back propagation in a largevocabulary speech recognition task (automatic transcription of TED talks). Two GPU implementations are compared: 1) a highlevel implementation using Theano 3 and 2) a native implementation using lowlevel CUDA BLAS libraries. We describe the scaling properties of these implementations in comparison to a baseline batchedx86 implementation as a function of training data size. We also explore the development time tradeoffs for each of the implementations