Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

Quantifying Assistive Robustness Via the Natural-Adversarial Frontier


RIGID - Robustness via Generative Natural-Adversarial Optimization
We can derive both natural human motions and adversarial human motions. Interestingly, when we jointly optimize for the two objectives, we arrive at human motions that are seemingly natural, yet cause catastrophic robot failures.


Motivations
Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Let's consider assistive robots, the pinnacle of human-robot interaction. You trained an RL policy to assist someone with disability in an activity of daily living, i.e. the person has an itch on their arm and they can't scratch themselves. How would you know that this policy is safe to deploy?

As we know, learning-based policies work well in distribution, and are prone to adversarial attacks outside this distribution. One way to approach this is to generate adversarial human motions. But do they actually reveal of the policy's brittleness? In this case, the adversarial human learns to "hides" their arm so the robot can't reach the itch spot and fails. Since this is improbable to happen to normal users, and preventable with safety measures, the failure is acceptable and not really showing policy's brittleness

What we need is not just adversarial human motion, but adversarial and "natural". When optimizing the human motion this way, we find a much more concerning example: a small jerk in the arm could send the robot flailing and spinning around wildly. Since this may well happen in daily interactions, it is catastrophic and needs to be addressed.



Computing the the Natural-Adversarial Frontier with RIGID

But how natural/unnatural do we allow the motions to be? We propose looking at the entire natural-adversarial pareto frontier, which represents the robot's performance under all possible types of human motions. Our method called RIGID presents a way to efficiently compute this pareto frontier.

Setting $\lambda \rightarrow \infty$ results in a policy $\tilde{\pi}_H$ that closely resembles $\pi_H$ and yields a high reward. On the other hand, setting $\lambda = 0$ leads to a policy $\tilde{\pi}_H$ that is purely adversarial and causes harm to the assistive task by inverting the environment reward. By selecting $\lambda \in [0, \infty)$, we arrive at a spectrum of human motions that interpolate between adversarial and natural. This is meaningful in Human-Robot Interaction because while we may not need to worry about purely adversarial human behaviour, as they are less likely to happen in reality, we need to take precaution against human behaviour that appears natural, yet causes robot failures.



Generating Edge-Cases for Assistive Robot Policies
Turns out RIGID is quite powerful! We validates with user studies that the RIGID-generated frontier can indeed identify natural failure cases. Compared with a trained expert user, RIGID generates failure cases in an automatic and more systematic manner. In comparison, the expert user can fail the robot, but the resulting failure cases tend to be exaggerated and unnatural.

    

Here we visualize human motions discovered on and below the frontier. Note that the natural failure cases on concentrated on the top middle part of the curve.



Measuring Robustness
Turns out Natural Adversarial Frontier is actually more powerful than generating edge cases. We can use its shape as a way to characterize the "robustness" of robot policies. Overall, the smaller the area-under-curve is, the safer the robot policy is against all naturally adversarial scenarios. In the paper, we use the Natural Adversarial Frontier to evaluate SOTA methods in robustifying RL policies and find interesting results.



Quantifying Assistive Robustness Via the Natural-Adversarial Frontier
CORL 2023 Paper  Code  BibTex