Robust Fine Tuning of Zero Shot Models

Views: 4

Researchers from the University of Washington, Columbia University, Open AI, the Allen Institute of Artificial Intelligence, and Toyota Research have teamed up to present a new method for finetuning these pretrained models such as GPT3, BERT, DALLE, EfficientNet, or CLIP for application specific datasets. The key insight is that as you finetune these models, you gain indistribution accuracy, but sacrifice the zeroshot flexibility, or outofdistribution generalization, of these pretrained foundation models. The authors present WeightSpace Ensembling, where you take a linear interpolation between the weights of the zeroshot and finetuned model to make new inference. This achieves a balance between in and out of distribution accuracy. The authors connect this to Linear Mode Connectivity to explain why it works compared to random weightspace ensembles, which do not work. This is another very interesting study on the Generalization capability of Deep Neural Networks. This includes solving problems o