A Peep into Kalman Filter

3 minute read

Published:

Deep insecurity about my non-mathy background urged me to look into things that pump into my ears all the time. Today I decided to do some quick reading on Kalman filter, and now it's time to write down my understanding. Bear with me, those "mathemagicians".

Kalman Filtering (KF), also known as linear quadratic estimation (LQE), is a tool of using Bayesian inference to make estimates about the joint probability distribution over the variables for each time frame. It has numerous applications in technology, and has recently been applied to robot motion planning and control problems. People have been using it for trajectory optimization problems in robotics in recent years, which is exactly the same problem that I described in my previous post. Before, I was saying how we can incorporate Locally Weighted Projection Regression (LWPR) into Gaussian Process (GP) in order to make real-time online accurate estimates. Now with Kalman filter, it's one of its innate properties to run real-time and recursively, which renders itself naturally to make online predictions for robot learning problems. Disclaimer here, This post will only look into the mechanics of Kalman Filter on its own.

It works in a two-step fashion. Assuming we are at time $latex t $ in the first step, it produces estimates for the current state variables, along with uncertainties; as we move forward and observe the measurement at time $latex t+1 $, the estimate we made at time $latex t $ will be updated through a weighted average (Somewhat similar to moving average in time series analysis? Yes, KF does have its place in time series analysis as well, especially in signal processing and econometrics.).  Very much like in Gaussian Process, it is used to calculate the posterior mean and variance of a multivariate Gaussian over time $latex t_{k} $. Our observations $latex y_{k} $, drawn from a multivariate Gaussian process $latex f_{k} $, can be described as follows:

$latex y_{k}=H_{k}f_{k}+\epsilon_{k} $

where $latex H_{k} $ is a linear observation model, and $latex \epsilon_{k} $ is a zero-mean multivariate Gaussian random vector with covariance $latex R=\sigma^{2}I $. Assume we have a non-linear function of the following shape to make inference on, the four points are the ones we've observed, albeit noisy.

Capture

Thus, we assume $latex y_{k}=H_{k}f(\textbf{X}')+\epsilon_{k} $. We can think of H as the link that relates the internal state of the system to the output measurement. $latex \epsilon_{k} $ describes how noisy the measurement is through the covariance matrix $latex R $. Our next focus zooms naturally into $latex f(\textbf{X}) $, where the posterior mean and variance of f can be calculated as follows:

$latex \bar{f}_{|k}(X')=\bar{f}_{|k-1}(X')+\left (P_{|k-1}(X,X)+\sigma^{2}I  \right )\cdot \left ( Y_{K}-f_{|k-1}(X) \right ) $

$latex P_{|k}(X^{*},X^{*})=P_{|k-1}(X^{*},X^{*})-P_{|k-1}(X^{*},X)\cdot \left ( P_{|k-1}(X,X)+\sigma^{2}I \right )^{-1}\cdot P_{|k-1}(X,X^{*}) $

where the first equation calculates the posterior mean given all previously observed points, the second equation gives the covariance function of the new observation $latex X^{*} $.

With this setup, we can then make predictions for new input $latex X^{*} $ with the above equations. Note that the above is a set of iterative equations, we need to specify a prior covariance for $latex f $ to start with, which can be conveniently borrowed from Gaussian Process. All that I've said here is just one narrow perspective of how you can view a Kalman filter, and I haven't looked into the details of how to incorporate GP kernels into Kalman filter process. Another post called "Kalman Filter for Dummies" explained the idea very clearly with a slightly different set of notations.


References:

[1] https://en.wikipedia.org/wiki/Kalman_filter

[2] Reece, S. and Roberts, S., 2010, July. An introduction to gaussian processes for the kalman filter expert. In Information Fusion (FUSION), 2010 13th Conference on (pp. 1-9). IEEE.