Deep Reader MLP Mixer: An all MLP Architecture for Vision

Views: 5

, machinelearning, deeplearning, mlpmixer, multilayerperceptron, objectclassification Paper Code (Will be available) Abstract Convolutional Neural Networks (CNNs) are the goto model for computer vision. Recently, attentionbased networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLPMixer, an architecture based exclusively on multilayer perceptrons (MLPs). MLPMixer contains two types of layers: one with MLPs applied independently to image patches mixing the perlocation features), and one with MLPs applied across patches mixing spatial information). When trained on large datasets, or with modern regularization schemes, MLPMixer attains competitive scores on image classification benchmarks, with pretraining and inference cost comparable to stat