In this talk I discuss how estimating and propagating epistemic uncertainty benefits generalization and deep exploration in reinforcement learning (RL) by focusing on two recent contributions.
First, I consider model-free distributional RL, which aims to learn the distribution of returns rather than their expected value. Typically, this involves a projection of the unconstrained distribution onto a set of representable, parametric distributions. To facilitate reliable estimation of epistemic uncertainty through diversity, we study the combination of several different projections and representations in a distributional ensemble. We establish theoretical properties of such projection ensembles and derive an algorithm that uses ensemble disagreement as a bonus for deep exploration.
Second, I discuss how propagating epistemic uncertainty estimates can be leveraged in a model-based RL setting, by embedding them in Monte-Carlo Tree Search (MCTS). We develop methodology to propagate epistemic uncertainty in MCTS, enabling agents to estimate the epistemic uncertainty in their predictions. Furthermore, we utilize the propagated uncertainty for a novel deep exploration algorithm by explicitly planning to explore. We show that our algorithm helps to address the pertinent challenges of dedicated deep exploration and reliability in the face of the unknown.