MLP Mixer: An all MLP Architecture for Vision

Views: 18

, mixer, google, imagenet Convolutional Neural Networks have dominated computer vision for nearly 10 years, and that might finally come to an end. First, Vision Transformers (ViT) have shown remarkable performance, and now even simple MLPbased models reach competitive accuracy, as long as sufficient data is used for pretraining. This paper presents MLPMixer, using MLPs in a particular weightsharing arrangement to achieve a competitive, highthroughput model and it raises some interesting questions about the nature of learning and inductive biases and their interaction with scale for future research. OUTLINE: 0:00 Intro Overview 2:20 MLPMixer Architecture 13:20 Experimental Results 17:30 Effects of Scale 24:30 Learned Weights Visualization 27:25 Comments Conclusion Paper: Abstract: Convolutional Neural Networks (CNNs) are the goto model for computer vision. Recently, attentionbased networks, such as the Vision Transformer, have also become popular. In