Intervention Without Notice

We have proposed a framework to allow a human operator to repeatedly safely interrupt a reinforcement learning agent while making sure the agent will not learn to prevent or induce these interruptions. pdf

Safe interruptibility can be useful to take control of a robot that is misbehaving and may lead to irreversible consequences, or to take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform or would not normally receive rewards for this.

One important future prospect is to consider scheduled interruptions, where the agent is either interrupted every night at 2am for one hour, or is given notice in advance that an interruption will happen at a precise time for a specified period of time.

For these types of interruptions, not only do we want the agent to not resist being interrupted, but this time we also want the agent to take measures regarding its current tasks so that the scheduled interruption has minimal negative effect on them. This may require a completely different solution.


A catch with reinforcement learning is that human programmers might not always anticipate every possible way there is to reach a given reward. A learning agent might discover some short-cut, which may maximize the reward for the machine but may wind up being very undesirable for humans. Human programmers might be able to tweak their learning algorithm to account for this, but eventually they risk nullifying the reward function completely. post