How to Choose a Reinforcement-Learning Algorithm
🌈 Abstract
The article provides a structured overview of existing reinforcement learning (RL) algorithms and action-distribution families, as well as guidelines for choosing appropriate methods for different situations. It aims to streamline the process of selecting RL algorithms and action-distribution families.
🙋 Q&A
[01] RL Algorithms
1. What are the key properties of RL algorithms covered in the article? The article covers various properties of RL algorithms, including:
- On-/off-policy
- Value-based, policy-based, or actor-critic
- Value-function learning approaches (e.g. TD, MC, eligibility traces)
- Entropy regularization
- Distributional
- Distributed
- Hierarchical
- Imitation learning
2. How does the article help in choosing an appropriate RL algorithm? The article provides tables that explain the relationships between environmental/task properties and desirable algorithm properties. This helps in selecting an RL algorithm that is suitable for a given situation.
3. What are the key considerations for deciding between model-free and model-based RL? The article states that model-free RL algorithms aim to learn a policy/value function directly from interaction, while model-based RL algorithms model the environment dynamics explicitly. The choice depends on factors like the feasibility of learning the environment model, the primary goal (e.g. training stability, data efficiency), and whether the environment dynamics are known in advance.
4. What are the key considerations for deciding between hierarchical and non-hierarchical RL? Hierarchical RL is recommended for highly complex action sequences that can be divided into sub-routines at different levels of granularity. Non-hierarchical approaches are preferred if action sequences are at most moderately complex.
5. When is it beneficial to combine RL with imitation learning? Imitation learning is recommended if an expert is available, as it can improve the training process and performance. If no expert is available, pure RL without imitation learning is used.
[02] Action-Distribution Families
1. What are the key action-distribution families covered for value-based RL algorithms? The article discusses value-based action-distribution families such as greedy, ε-greedy, Boltzmann exploration, randomized value functions, and noisy nets.
2. What are the key parametric action-distribution families covered for policy-based and actor-critic RL algorithms? The article discusses parametric action-distribution families such as categorical, Gaussian, Gaussian mixture, normalizing flows, stochastic networks/black-box policies, tanh, beta distribution, deterministic, and added noise.
3. How does the article guide the selection of action-distribution families? The article provides considerations on factors like stochasticity, expressiveness, action space, and the environment/task properties to help choose an appropriate action-distribution family.
4. What is the difference between simple and expressive action-distribution families? Simple action-distribution families have limited expressiveness, e.g. Gaussian or beta distributions, while expressive families can model complex, multi-modal distributions, e.g. Gaussian mixtures or normalizing flows.