Expire Span: Not All Memories are Created Equal: Learning to Forget by Expiring ( Paper Explained)
, expirespan, nlp, facebookai Facebook AI (FAIR) researchers present ExpireSpan, a variant of Transformer XL that dynamically assigns expiration dates to previously encountered signals. Because of this, ExpireSpan can handle sequences of many thousand tokens, while keeping the memory and compute requirements at a manageable level. It severely matches or outperforms baseline systems, while consuming much less resources. We discuss its architecture, advantages, and shortcomings. OUTLINE: 0:00 Intro Overview 2:30 Remembering the past in sequence models 5:45 Learning to expire past memories 8:30 Difference to local attention 10:00 Architecture overview 13:45 Comparison to Transformer XL 18:50 Predicting expiration masks 32:30 Experimental Results 40:00 Conclusion Comments Paper: Code: ADDENDUM: I mention several times that the gradient signal of the e quantity only occurs inside the R ramp. By th
|
|