Title
A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking
Authors
M.S. Arulampalam ; S. Maskell ; N. Gordon ; T. Clapp
Summary
- nonlinearity and non-Gaussianity are increasingly included in the model.
- it is typically crucial to process data on-line as it arrives.
- Review both Optimal and Suboptimal Bayesian algorithm for nonlinear/nonGaussian tracking problem.
- focus on particle filter
- particle filters are sequential Monte Carlo methods based on point mass(“particle”) representations of probability densities, which generalize the traditional Kalman Filtering.
- Several variants: SIR, ASIR, RPF within a generic framework of Sequential Importance Sampling(SIS)
- compare with standard Extended Kalman Filter(EKF)
Non Linear Bayesian Tracking
Probabilistic Model
- State sequence $\left{\mathbf{x}{k}, k \in \mathbb{N}\right}$: $,\mathbf{x}{k}=\mathbf{f}{k}\left(\mathbf{x}{k-1}, \mathbf{v}_{k-1}\right)$,
- measurement sequence $\mathbf{z}{k}=\mathbf{h}{k}\left(\mathbf{x}{k}, \mathbf{n}{k}\right)$:
- $f_k, h_k$ is possibly nonlinear function
Goal
we seek filtered estimates of $xk$ based on the set of all available measurements $z{1:k}$
From a Bayesian perspective, the tracking problem is to recursively calculate some degree of belief in the state $xk$ at time $k$, taking different values, given the data $z{1:k}$ up to time k.
That is $p(xk | z{1:k})$
Chapman-Kolmogorov equation
$p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right)=\int p\left(\mathbf{x}{k} | \mathbf{x}{k-1}\right) p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right) d \mathbf{x}_{k-1}$
then
$p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right)=\frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}\right) p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right)}{p\left(\mathbf{z}{k} | \mathbf{z}{1 : k-1}\right)}$
these 2 equations form the basis for the Optimal Bayesian solution.
Optimal Algorithm
Kalman Filter
- This is the optimal solution to the tracking problem—if the (highly restrictive) assumptions hold.
- The implication is that no algorithm can ever do better than a Kalman filter in this linear Gaussian environment.
- It should be noted that it is possible to derive the same results using a least squares (LS) argument.
Discussion
All the distributions are described by their means and covariance, not constrained to be Gaussian when the algorithm remains unaltered.
[Assuming the means and covariances to be unbiased and consistent, the filter then optimally derives the mean and covariance of the posterior.]
Not certain to be optimal when posterior is not necessarily Gaussian.
When smoothed estimates of states are required: $p\left(\mathbf{x}{k} | \mathbf{z}{1 : k+\ell}\right), \ell > 0$. then the Kalman Smoother is Optimal estimator.
- It’s possible to formulate a backward-time KF and combine the forward and backward estimates to obtain an overall smoothed estimator.
- or different formulation
But they have to calculate matrix inverses that do not necessarily exist.
Instead, it is preferable to propagate different quantities using an information filter when carrying out the backward-time recursion.
Grid-Based Methods
Optimal when the state space is discrete and finite.
suppose the state space at time $k-1$ consists discrite states $\mathbf{x}{k-1}^{i}, i=1, \dots, N{s}$. For each $x{k-1}^i$, let $\operatorname{Pr}\left(\mathbf{x}{k-1}=\mathbf{x}{k-1}^{i} | \mathbf{z}{1 : k-1}\right)=w{k-1 | k-1}^{i}$. Then posterior at $k-1$: $p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right)=\sum{i=1}^{N{s}} w{k-1 | k-1}^{i} \delta\left(\mathbf{x}{k-1}-\mathbf{x}{k-1}^{i}\right)$, where $\delta(.)$ represents the Dirac delta measure: $\delta{x}(A)=1{A}(x)=\left{\begin{array}{ll}{0,} & {x \notin A} \ {1,} & {x \in A}\end{array}\right.$
Then $\begin{aligned} p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right) &=\sum{i=1}^{N{s}} w{k | k-1}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right) \ p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right) &=\sum{i=1}^{N{s}} w{k | k}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right) \end{aligned}$ with $\begin{aligned} w{k | k-1}^{i} & \triangleq \sum{j=1}^{N{s}} w{k-1 | k-1}^{j} p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{j}\right) \ w{k | k}^{i} \triangleq & \frac{w{k | k-1}^{i} p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right)}{\sum{j=1}^{N{s}} w{k | k-1}^{j} p\left(\mathbf{z}{k} | \mathbf{x}_{k}^{j}\right)} \end{aligned}$
Suboptimal algorithm
when the assumptions do not hold, approximations are necessary. In this section, we consider approximate nonlinear Bayesian filter
extended Kalman Filter (EKF)
Discussion
The idea is use local linearization of the equation to approximate the nonlinearity, based on delta method. (1st order Taylor expansion)
higher order EKF exists but the additional complexity has prohibited its widespread use.
unscented Kalman Filter
Consider a set of points that are deterministically selected from the Gaussian approximation to $p(xk | z{1:k})$ and re-estimate the parameters of the Gaussian approximation.
For some problem, this method has been shown to give better performances, as the parameters of the Gaussian approximation is improved.
problems
If the true density is non-Gaussian (bimodal or heavily skewed!!!), then a Gaussian can never describe it well. In such cases, approximate grid-based filter and particle filter will yield an improvement.
approximate grid-based methods
Decompose the continuous state space into $N_s$ “cells”, and do grid-based methods.
$p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right) \approx \sum{i=1}^{N{s}} w{k-1 | k-1}^{i} \delta\left(\mathbf{x}{k-1}-\mathbf{x}{k-1}^{i}\right)$, where $\begin{array}{c}{w{k | k-1}^{i} \triangleq \sum{j=1}^{N{s}} w{k-1 | k-1}^{j} \int{\mathbf{x} \in \mathbf{x}{k}^{i}} p\left(\mathbf{x} | \overline{\mathbf{x}}{k-1}^{j}\right) d \mathbf{x}} \ {w{k | k}^{i} \triangleq \frac{w{k | k-1}^{i} \int{\mathbf{x} \in \mathbf{x}{k}^{i}} p\left(\mathbf{z}{k} | \mathbf{x}\right) d \mathbf{x}}{\sum{j=1}^{N{s}} w{k | k-1}^{j} \int{\mathbf{x} \in \mathbf{x}{k}^{j}} p\left(\mathbf{z}_{k} | \mathbf{x}\right) d \mathbf{x}}}\end{array}$
In practice, further approximation is made in the evaluation of $w_{k|k}^i$, specifically, the weights are computed at the center of the “cell”.
Discussion
- grid must be sufficiently dense,
- computational cost increases dramatically when dimensionality increases.
- state space must be predefined.
Hidden Markov Model (HMM) are an application of such approximate grid-based method in a fixed-interval smoothing context.
particle filters
Sequential Importance Sampling (SIS) Algorithm
SIS form the basis for most sequential MC (SMC) filters. SMC is known variously as bootstrap filtering.
Idea
implementing a recursive Bayesian filter by MC simulations. The key idea is to represent the required posterior density function by a set of random samples with associated weights and to compute estimates based on these samples and weights. As the number of samples becomes very large, this MC characterization becomes an equivalent representation to the usual functional description of the posterior pdf, and the SIS filter approaches the optimal Bayesian estimate.
framework
Let $\left{\mathbf{x}{0 : k}^{i}, w{k}^{i}\right}{i=1}^{N{s}}$ Denote a random measure that characterizes the posterior pdf $p\left(\mathbf{x}{0 : k} | \mathbf{z}{1 : k}\right)$. Then $p\left(\mathbf{x}{0 : k} | \mathbf{z}{1 : k}\right) \approx \sum{i=1}^{N{s}} w{k}^{i} \delta\left(\mathbf{x}{0 : k}-\mathbf{x}_{0 : k}^{i}\right)$.
- The weights are chosen using the principle of importance sampling
- draw $x{0:k}^i$ importance density $q(x{0:k} | z_{1:k})$ by augmenting existing states 0:k-1 with a new state k.
- calculate weights $w{k}^{i} \propto \frac{p\left(\mathbf{x}{0 : k}^{i} | \mathbf{z}{1 : k}\right)}{q\left(\mathbf{x}{0 : k}^{i} | \mathbf{z}_{1 : k}\right)}$ by $wk^i \propto w{k-1}^{i} \frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right) p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}\right)}{q\left(\mathbf{x}{k}^{i} | \mathbf{x}{0 : k-1}^{i}, \mathbf{z}{1 : k}\right)} = w{k-1}^{i} \frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right) p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}\right)}{q\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}, \mathbf{z}_{k}\right)}$ by Markov Chain
importance sampling
- proposal (importance) density: $q(x)$
- draw $x^i \sim q(x)$
- Then $p(x) \approx \sum{i=1}^{N{s}} w^{i} \delta\left(x-x^{i}\right)$, where $w^{i} \propto \frac{\pi\left(x^{i}\right)}{q\left(x^{i}\right)}$
Discussion
Degeneracy
very common problem with SIS PF, where after a few iterations, all but one particles will have negligible weight. The variance of the importance weights can only increase over time.
It implies that a large computational effort is devoted to updating particles whose attribution is almost zero.
A suitable measure of degeneracy is estimate of Effective Sample Size $\widehat{N{e f f}}=\frac{1}{\sum{i=1}^{N{s}}\left(w{k}^{i}\right)^{2}}$ .
Note that $\widehat{N_{eff}} \le N_s$ and smaller indicates severer degeneracy.
using a large $N_s$ to solve degeneracy is often impractical, therefore, we rely on 2 methods:
- good choice of importance density
- use of resampling.
Choice of Importance Density
to minimize $Var(wk^i)$, so that $N{eff}$ is maximized.
2 cases when use of the optimal importance density is possible:
- $x_k$ is a member of a finite set.
- $p(xk | x{k-1}^i, zk)$ is Gaussian. This can occur if dynamics are nonlinear, but measurements are linear, and $v{k-1}, n_k$ are mutually independent iid Gaussian. Analytic evaluation is possible.
For many other models, it’s possible to construct suboptimal approximation by usign local lienarization techniques.
Or use the unscented transform to estiamte Gaussian approximation to $p(xk | x{k-1}, z_k)$. The additional computational cost is often more than offset by a reduction in the number of samples required to achieve a certain level of performance.
It’s often convent to choose $q\left(\mathbf{x}{k} | \mathbf{x}{k-1}^{i}, \mathbf{z}{k}\right)=p\left(\mathbf{x}{k} | \mathbf{x}{k-1}^{i}\right)$, which will yield $w{k}^{i} \propto w{k-1}^{i} p\left(\mathbf{z}{k} | \mathbf{x}_{k}^{i}\right)$. Intuitive and simple to implement.
Resampling
When a significant degeneracy is observed.
Idea is to eliminate particles that have small weights and to concentrate on particles with large weights.
Generating a new set $\left{\mathbf{x}{k}^{i *}\right}{i=1}^{N_{s}}$ by resampling with replacement $Ns$ times from $p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right) \approx \sum{i=1}^{N{s}} w{k}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right)$ so that $\operatorname{Pr}\left(\mathbf{x}{k}^{i *}=\mathbf{x}{k}^{j}\right)=w_{k}^{j}$.
Problems
- limits the opportunity to parallelize since all the particles must be combined.
- particles that have high weights are statistically selected many times. Loss of diversity as the resultant sample will contain many repeated points. [Sample Impoverishment]
- smoothed estimates based on the particles’ path degenerate
Other particle filter
the other PF can be regarded as special cases of general SIS PF, and can be derived by appropriate choice of importance sampling density and/or modification of the resampling step.
- Sampling importance resampling (SIR)
- Auxiliary sampling importance resampling (ASIR) filter
- Regularized Particle Filter (PRF)
omitted
Example
omitted
Discussion
- If the assumptions of Kalman filter or grid-based filter hold, then no other algorithm can outperform them. The assumptions do not hold in a variety of real scenarios.
- EKF approximates the probability density by Gaussian.
- Approximate grid-based filters approximates the continuous state space as a set of discete regions, and become computationally expensive when dealing with high-dimensional state space.
- Particle Filtering approximates the density directly as a finite number of MC samples. The choice of importance density is critical
Reference
- M. S. Arulampalam, S. Maskell, N. Gordon and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” in IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174-188, Feb. 2002. doi: 10.1109⁄78.978374
- https://en.wikipedia.org/wiki/Dirac_measure