Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot
In this paper, we present an automated learning environment for developing control policies directly on the hardware of a modular legged robot. This environment facilitates the reinforcement learning process by computing the rewards using a visionbased tracking system and relocating the robot to the initial position using a resetting mechanism. We employ two stateoftheart deep reinforcement learning (DRL) algorithms, Trust Region Policy Optimization (TRPO) and Deep Deterministic Policy Gradient (DDPG),
|
|