XCi T: Cross Covariance Image Transformers ( Facebook AI Machine Learning Research Paper Explained)
, xcit, transformer, attentionmechanism After dominating Natural Language Processing, Transformers have taken over Computer Vision recently with the advent of Vision Transformers. However, the attention mechanism s quadratic complexity in the number of tokens means that Transformers do not scale well to highresolution images. XCiT is a new Transformer architecture, containing XCA, a transposed version of attention, reducing the complexity from quadratic to linear, and at least on image data, it appears to perform on par with other models. What does this mean for the field Is this even a transformer What really matters in deep learning OUTLINE: 0:00 Intro Overview 3:45 SelfAttention vs CrossCovariance Attention (XCA) 19:55 CrossCovariance Image Transformer (XCiT) Architecture 26:00 Theoretical Engineering considerations 30:40 Experimental Results 33:20 Comments Conclusion Paper: Code: Abstract: Following their
|
|