Lecture 15, Efficient Methods and Hardware for Deep Learning
In Lecture 15, guest lecturer Song Han discusses algorithms and specialized hardware that can be used to accelerate training and inference of deep learning workloads. We discuss pruning, weight sharing, quantization, and other techniques for accelerating inference, as well as parallelization, mixed precision, and other techniques for accelerating training. We discuss specialized hardware for deep learning such as GPUs, FPGAs, and ASICs, including the Tensor Cores in NVIDIAs latest Volta GPUs as well as Goo
|
|