Self attention in deep learning (transformers)
Selfattention in deep learning (transformers) Self attention is very commonly used in deep learning these days. For example, it is one of the main building blocks of the Transformer paper (Attention is all you need) which is fast becoming the go to deep learning architectures for several problems both in computer vision and language processing. Additionally, all these famous papers like BERT, GPT, XLM, Performer use some variation of the transformers which in turn is built using selfattention layers as building blocks. So this video is about understanding a simplified version of the attention mechanism in deep learning. Note: This is part 1 in the series of videos about Transformers.
|
|