## Foundations of Interpretable and Reliable Machine Learning

### 1. Introduction

Question we address: How to develop physics-informed reinforcement learning algorithms that guarantee safety and interpretability ?

It is widely known that the modern state-of-the-art reinforcement learning algorithms (DDPG, SAC, TD3, PPO) are burdened with very serious issues, including

• the trained policies are very unstable and brittle with respect to perturbations,
• it is challenging to transfer the trained policies,
• for mission critical applications, it is hard to provide safety guarantees (e.g. constraint satisfaction),
• the trained policies (usually artificial neural networks) are usually not interpretable or hard to interpret,

Deployment of reinforcement learning in safety critical industrial applications and real life scenarios requires developing new approaches or significantly improving existing approaches. The critical applications we have in mind include

• Self driving cars developed by most of the major car makers
• Autonomous space-ships, developed for example by NASA
• Robotic arms developed for medical surgeries

### 4. Computational Research

There are many opportunities for research on the SPP methods, below we present the most promising ones. Potential for very interesting experiments.

• Interpretability provided by the target-states outputted by the policy, which enables predictability of the actor actions, to show interpretability experimentally we plan to perform a computation using the Ant-Maze environment,
• Safety RL -- train for policies that are guaranteeing safety , the agent is behaving under safety constraints, i.e. is not going to hit a unsafe region, and is going to avoid moving enemies, see Fig. 4 with example environments,
• Transfer -- transfer to different, even slightly modified, task or environment is a formidable task for classical RL methods, the transfer issue of policies can be adressed using our SPP approach,
• Develop policies using more relevant and 'physics informed' neural architectures like Neural ODEs or SIREN,
• Create a new environment -- a testbed for interpretable and safe RL methods, see Fig. 5 below,

### 4. Theoretical Research

• Formal Verification -- we can attempt a formal verification of trained policies, i.e. mathematically verify that the policy will satisfy the safety constrains within some fixed time horizon,
• Theoretical guarantees provide convergence guarantees of the Bellman iterates for our variant of the Q-function $Q^{\pi, CM}(s_t, a_t) = \mathbb{E}_{r_{i\ge t},s_{i>t}\sim E,\ z_{i>t}\sim \pi,\ a_i = CM(s_i,z_i)}{\left[R_t|s_t,a_t\right]}.$
• combine the SPP approach with the the Hamilton-Jacobi-Bellman algorithm algorith for RL safety, where a variational method is being combined with the $Q$-function optimization in order to maximize the return under saefty constraints $0 = \min\left\{l(x)-V(x,t), \frac{\partial V}{\partial t}+\max_{u\in\mathcal{U}}{\nabla_x{V^Tf(x,u)}}\right\}.$