Decision Transformer: Reinforcement Learning via Sequence Modeling ( Research Paper Explained)
, decisiontransformer, reinforcementlearning, transformer Proper credit assignment over long timespans is a fundamental problem in reinforcement learning. Even methods designed to combat this problem, such as TDlearning, quickly reach their limits when rewards are sparse or noisy. This paper reframes offline reinforcement learning as a pure sequence modeling problem, with the actions being sampled conditioned on the given history and desired future rewards. This allows the authors to use recent advances in sequence modeling using Transformers and achieve competitive results in Offline RL benchmarks. OUTLINE: 0:00 Intro Overview 4:15 Offline Reinforcement Learning 10:10 Transformers in RL 14:25 Value Functions and Temporal Difference Learning 20:25 Sequence Modeling and Rewardtogo 27:20 Why this is ideal for offline RL 31:30 The context length problem 34:35 Toy example: Shortest path from random walks 41:00 Discount factors 45:50 Experimental Results 49:25 Do you need to know the be
|
|