Hey robot, why you are so smart?

Published:

Are you being misguided here by the catchy name? Lucky you, this is gonna be my last proper Gaussian Process (GP) post. Just to assure you, I did not just use "robot" in the post name to wave hands at you; but I do intend to explain in this post how to use GP to "teach" a robot to use it's arm properly.

People have long been trying to use a smart way to have robot mimic the behaviour of human being. Here, I'm gonna explain how we train a robot to use it's right arm to reach an object, and how all of these can be done online by the guidance of GP. I hope I have convinced you all in my previous posts that GP is really flexible and powerful. One thing I would say about its weakness is probably just in terms of the computational complexity. This is not something we can ignore easily if we are trying to make predictions online when the computational cost scales cubically with the number of data points we have. So, in the robotic learning setting, people have suggested smart ways of going around this issue in order to still use the prediction power of GP. Local Gaussian Process Regression (LGP) among many other methods, has been shown to be very pleasing both in terms of prediction accuracy and speed in the online robot learning setting. This method is a legion  composed of two giants - Gaussian Process (GP) and Locally Weighted Projection Regression (LWPR). Each of which complements the other's weakness, together they are invincible!

First I'll give you a bit of a refresher on the key points from Gaussian Process Regression (GPR). We describe our observed targets by a Gaussian Process

$latex \textbf{y} \sim \mathcal{N} (\textbf{0},K(\textbf{X},\textbf{X})+\sigma_{n}^{2}\textbf{I})$

Conditioning on the joint of the training information and new information, we obtain the predicted mean $latex f(\textbf{x}_{*})$ with corresponding variance $latex V(\textbf{x}_{*})$:

$latex f(\textbf{x}_{*})=k_{*}^{T}(K+\sigma_{n}^{2}\textbf{I})^{-1}\textbf{y}$

$latex V(\textbf{x}_{*})=k(\textbf{x}_{*},\textbf{x}_{*})-\textbf{x}_{*}^{T}(K+\sigma_{n}^{2}\textbf{I})^{-1}k_{*}$

Now, some more prep work in case you are not familiar with LWPR. LWPR is said to be the fastest and most task-appropriate real-time learning algorithm for inverse dynamics problem. The idea is to aggregate the prediction results of several locally weighted linear models together to give the final inference. The weighted prediction is given by $latex \hat{y}=\mathbb{E}\left \{ \bar{y}_{k}|\textbf{x} \right \}=\sum_{k=1}^{M}p(\bar{y}_{k}|\textbf{x})$, where the probability of the model $latex k$ given input data $latex \textbf {x}$ can be expressed as

$latex p(k|\textbf{x})=\frac{p(k|\textbf{x})}{\sum_{k=1}^{M}p(k|\textbf{x})}=\frac{w_{k}}{\sum_{k=1}^{M}w_{k}}$

OK, finally time to introduce our main role LGP and the way we apply this method to robot learning problem. The obvious way to go is to combine GPR with LWPR together, so that we can both make use of the prediction accuracy from GPR and borrow the computational power from LWPR.

Look at the plot to the right, these are the training trajectories of a robot's right arm. Red dots show the location of the target object. Our goal is to learn the system dynamics by learning the trajectories through a regression problem. The input data we have are joint angles, velocities, and accelerations of the robot; and the output we want to map to is the joint torque necessary for following a desired trajectory.

Now that we've fully set up the scene. The procedure is carried out as follows:

1. First we have to partition our training data into several sub-regions. We borrow the kernel used in GP to measure the similarity of two data points, and thus base the division of sub-regions on that.
2. We then calculate the proximity between a new  data point with the existing cluster centers. The data will be included in the nearest local model, and the center of the corresponding local cluster will also be updated.
3. All predictions in these sub-regions are given by GP instead of linear models in the LWPR setting.
4. Finally we aggregate all the local predictions together:

$latex \hat{y}(\textbf{x})=\sum_{i=1}^{M}w_{i}\bar{y}_{i}(\textbf{x})/\sum_{i=1}^{M}w_{i}$

This is indeed a great way to reduce the computational cost of GPR by incorporate LWPR naturally. After all, it's very much a mixture of experts approach, which we basically divides the whole input space into smaller subspaces and emplying responsible GP experts to deal with them. Now with the matrix inversion task being reduced to a much smaller number of data points, you can fully understand why the computational cost of GP can be greatly reduced.

All the above I gave are very much just the vague ideas of the whole process. If you do have interest in knowing more about the inverse dynamics problem, don't hesitate to read  the brilliant paper I referenced below. BTW, now you know the answer to question in the title - because the robot has got a magic bullet called Gaussian Process :P

References:

[1] Forte, D., Ude, A. and Kos, A., 2010, June. Robot learning by Gaussian process regression. In Robotics in Alpe-Adria-Danube Region (RAAD), 2010 IEEE 19th International Workshop on (pp. 303-308). IEEE.

[2] Nguyen-Tuong, D., Seeger, M. and Peters, J., 2009. Model learning with local gaussian process regression. Advanced Robotics, 23(15), pp.2015-2034.