Ex T5: Towards Extreme Multi Task Scaling for Transfer Learning ( Paper Explained)

Views: 4

, ext5, transferlearning, exmix The T5 model has been a staple for NLP research for the last years. Both its size and its approach to formulate all NLP tasks as promptbased language modeling make it a convenient choice to tackle new challenges and provides a strong baseline for most current datasets. ExT5 pushes T5 to its limits by pretraining not only on selfsupervised mask filling, but also at the same time on 107 different supervised NLP tasks, which is their new ExMix dataset. The resulting model compares very favorably to T5 when finetuned to downstream tasks. OUTLINE: 0:00 Intro Overview 2:15 Recap: The T5 model 3:55 The ExT5 model and task formulations 8:10 ExMix dataset 9:35 Do different tasks help each other 16:50 Which tasks should we include 20:30 PreTraining vs PreFinetuning 23:00 A few hypotheses about what s going on 27:20 How much selfsupervised data to use 34:15 More experimental results 38:40 Conclusion Summary Paper: Abs