# Reinforcement Learning for Physical Dynamical Systems: An Alternative Approach

## ๐ Abstract

The article discusses the role of control theory in modern civilization, its limitations in dealing with nonlinear systems, and the potential of reinforcement learning (RL) and genetic programming (GP) approaches to address these limitations. It presents an experiment comparing the performance of a leading RL algorithm (Soft Actor Critic) and a GP-based approach in controlling a simple pendulum system, both with and without sensor noise.

## ๐ Q&A

### [01] Control Theory and its Limitations

**1. What are the key insights provided by physics equations that have enabled the development of control theory?**

- Control theory has been built on insights from physics equations derived from Newton's Laws and Maxwell's equations, which describe the dynamics and interplay of different forces on physical systems.
- These equations allow us to understand how a system moves between states, where a state is defined as the set of information that sufficiently describes the system.
- By deriving equations for the systems, we can predict how the states change through time and space, and express this evolution in terms of a differential equation.

**2. What are the limitations of control theory built around linear systems?**

- Most control theory is built around linear systems, where a proportional change in input leads to a proportional change in output.
- While these linear systems can be quite complex, the progress in controlling complex physical systems has mostly come through finding ways to limit them to linear behavior.
- This can come at the cost of efficiency, as it may require breaking down complex systems into component parts, operating systems at simpler but less efficient modes, or not taking advantage of complex physics.

**3. What are the challenges posed by nonlinear dynamical systems?**

- Nonlinear systems exhibit complex responses to inputs, with small changes in environment or state leading to dramatic variations in behavior.
- Unlike linear systems, we often don't have an easily predictable idea of how a nonlinear system will behave as it transitions from one state to the next.
- This presents two key challenges: system identification (understanding how the system will behave at a given state) and system control (determining the input to get the desired outcome).

### [02] Reinforcement Learning and Genetic Programming Approaches

**1. How do reinforcement learning (RL) algorithms address the challenges of nonlinear dynamical systems?**

- RL algorithms, such as Soft Actor Critic (SAC), address the problems of system identification and control optimization by sampling the environment to develop a prediction of what input actions lead to desired outcomes.
- RL algorithms apply a policy of actions based on the system state and refine this policy as they analyze more information on the system, without relying on manipulation and analysis of governing equations.

**2. What are the limitations of using neural networks as function approximators in RL?**

- Neural networks can have high resource requirements, as estimating the state-action-reward function for every state can be computationally intensive.
- Neural networks can also be difficult to explain, limiting the ability to gain insight into the system and use analytical tools developed over centuries of mathematics.

**3. How does the genetic programming (GP) approach differ from neural network-based RL?**

- The GP approach aims to find an expression comprised of common mathematical operators (arithmetic, algebraic, and transcendental functions) to approximate the system dynamics, rather than using a neural network as a black-box function approximator.
- This can potentially offer benefits such as increased explainability, easier incorporation of domain knowledge, and better stability, though the scope may be more limited compared to neural networks.

**4. How did the GP and SAC algorithms perform in the experiments with the pendulum system?**

- In the experiments, the GP algorithm outperformed the SAC algorithm in terms of computational efficiency, converging to a solution with far less computational requirement.
- This was observed both in the default pendulum gymnasium environment and in the environment with added sensor noise.

**5. What are the potential advantages and limitations of the GP approach compared to neural network-based RL?**

- Potential advantages of GP include increased explainability, easier incorporation of domain knowledge, and better stability, but it may be limited in scope compared to the universal function approximation capabilities of neural networks.
- Neural network-based RL approaches benefit from extensive optimization and maturity, but the article suggests that further development of the GP approach could lead to significant improvements in efficiency and performance.