Paper 2

Title

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

Download pdf

Authors

M.S. Arulampalam ; S. Maskell ; N. Gordon ; T. Clapp

Summary

  • nonlinearity and non-Gaussianity are increasingly included in the model.
  • it is typically crucial to process data on-line as it arrives.
  • Review both Optimal and Suboptimal Bayesian algorithm for nonlinear/nonGaussian tracking problem.
  • focus on particle filter
  • particle filters are sequential Monte Carlo methods based on point mass(“particle”) representations of probability densities, which generalize the traditional Kalman Filtering.
  • Several variants: SIR, ASIR, RPF within a generic framework of Sequential Importance Sampling(SIS)
  • compare with standard Extended Kalman Filter(EKF)

Non Linear Bayesian Tracking

Probabilistic Model

  • State sequence $\left{\mathbf{x}{k}, k \in \mathbb{N}\right}$: $,\mathbf{x}{k}=\mathbf{f}{k}\left(\mathbf{x}{k-1}, \mathbf{v}_{k-1}\right)$,
  • measurement sequence $\mathbf{z}{k}=\mathbf{h}{k}\left(\mathbf{x}{k}, \mathbf{n}{k}\right)$:
  • $f_k, h_k$ is possibly nonlinear function

Goal

we seek filtered estimates of $xk$ based on the set of all available measurements $z{1:k}$

From a Bayesian perspective, the tracking problem is to recursively calculate some degree of belief in the state $xk$ at time $k$, taking different values, given the data $z{1:k}$ up to time k.

That is $p(xk | z{1:k})$

Chapman-Kolmogorov equation

$p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right)=\int p\left(\mathbf{x}{k} | \mathbf{x}{k-1}\right) p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right) d \mathbf{x}_{k-1}$

then

$p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right)=\frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}\right) p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right)}{p\left(\mathbf{z}{k} | \mathbf{z}{1 : k-1}\right)}$

these 2 equations form the basis for the Optimal Bayesian solution.

Optimal Algorithm

Kalman Filter

  • This is the optimal solution to the tracking problem—if the (highly restrictive) assumptions hold.
  • The implication is that no algorithm can ever do better than a Kalman filter in this linear Gaussian environment.
  • It should be noted that it is possible to derive the same results using a least squares (LS) argument.

Discussion

All the distributions are described by their means and covariance, not constrained to be Gaussian when the algorithm remains unaltered.

[Assuming the means and covariances to be unbiased and consistent, the filter then optimally derives the mean and covariance of the posterior.]

Not certain to be optimal when posterior is not necessarily Gaussian.

When smoothed estimates of states are required: $p\left(\mathbf{x}{k} | \mathbf{z}{1 : k+\ell}\right), \ell > 0$. then the Kalman Smoother is Optimal estimator.

  • It’s possible to formulate a backward-time KF and combine the forward and backward estimates to obtain an overall smoothed estimator.
  • or different formulation

But they have to calculate matrix inverses that do not necessarily exist.

Instead, it is preferable to propagate different quantities using an information filter when carrying out the backward-time recursion.

Grid-Based Methods

Optimal when the state space is discrete and finite.

suppose the state space at time $k-1$ consists discrite states $\mathbf{x}{k-1}^{i}, i=1, \dots, N{s}$. For each $x{k-1}^i$, let $\operatorname{Pr}\left(\mathbf{x}{k-1}=\mathbf{x}{k-1}^{i} | \mathbf{z}{1 : k-1}\right)=w{k-1 | k-1}^{i}$. Then posterior at $k-1$: $p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right)=\sum{i=1}^{N{s}} w{k-1 | k-1}^{i} \delta\left(\mathbf{x}{k-1}-\mathbf{x}{k-1}^{i}\right)$, where $\delta(.)$ represents the Dirac delta measure: $\delta{x}(A)=1{A}(x)=\left{\begin{array}{ll}{0,} & {x \notin A} \ {1,} & {x \in A}\end{array}\right.$

Then $\begin{aligned} p\left(\mathbf{x}{k} | \mathbf{z}{1 : k-1}\right) &=\sum{i=1}^{N{s}} w{k | k-1}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right) \ p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right) &=\sum{i=1}^{N{s}} w{k | k}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right) \end{aligned}$ with $\begin{aligned} w{k | k-1}^{i} & \triangleq \sum{j=1}^{N{s}} w{k-1 | k-1}^{j} p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{j}\right) \ w{k | k}^{i} \triangleq & \frac{w{k | k-1}^{i} p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right)}{\sum{j=1}^{N{s}} w{k | k-1}^{j} p\left(\mathbf{z}{k} | \mathbf{x}_{k}^{j}\right)} \end{aligned}$

Suboptimal algorithm

when the assumptions do not hold, approximations are necessary. In this section, we consider approximate nonlinear Bayesian filter

extended Kalman Filter (EKF)

Discussion

The idea is use local linearization of the equation to approximate the nonlinearity, based on delta method. (1st order Taylor expansion)

higher order EKF exists but the additional complexity has prohibited its widespread use.

unscented Kalman Filter

Consider a set of points that are deterministically selected from the Gaussian approximation to $p(xk | z{1:k})$ and re-estimate the parameters of the Gaussian approximation.

For some problem, this method has been shown to give better performances, as the parameters of the Gaussian approximation is improved.

problems

If the true density is non-Gaussian (bimodal or heavily skewed!!!), then a Gaussian can never describe it well. In such cases, approximate grid-based filter and particle filter will yield an improvement.

approximate grid-based methods

Decompose the continuous state space into $N_s$ “cells”, and do grid-based methods.

$p\left(\mathbf{x}{k-1} | \mathbf{z}{1 : k-1}\right) \approx \sum{i=1}^{N{s}} w{k-1 | k-1}^{i} \delta\left(\mathbf{x}{k-1}-\mathbf{x}{k-1}^{i}\right)$, where $\begin{array}{c}{w{k | k-1}^{i} \triangleq \sum{j=1}^{N{s}} w{k-1 | k-1}^{j} \int{\mathbf{x} \in \mathbf{x}{k}^{i}} p\left(\mathbf{x} | \overline{\mathbf{x}}{k-1}^{j}\right) d \mathbf{x}} \ {w{k | k}^{i} \triangleq \frac{w{k | k-1}^{i} \int{\mathbf{x} \in \mathbf{x}{k}^{i}} p\left(\mathbf{z}{k} | \mathbf{x}\right) d \mathbf{x}}{\sum{j=1}^{N{s}} w{k | k-1}^{j} \int{\mathbf{x} \in \mathbf{x}{k}^{j}} p\left(\mathbf{z}_{k} | \mathbf{x}\right) d \mathbf{x}}}\end{array}$

In practice, further approximation is made in the evaluation of $w_{k|k}^i$, specifically, the weights are computed at the center of the “cell”.

Discussion

  • grid must be sufficiently dense,
  • computational cost increases dramatically when dimensionality increases.
  • state space must be predefined.

Hidden Markov Model (HMM) are an application of such approximate grid-based method in a fixed-interval smoothing context.

particle filters

Sequential Importance Sampling (SIS) Algorithm

SIS form the basis for most sequential MC (SMC) filters. SMC is known variously as bootstrap filtering.

Idea

implementing a recursive Bayesian filter by MC simulations. The key idea is to represent the required posterior density function by a set of random samples with associated weights and to compute estimates based on these samples and weights. As the number of samples becomes very large, this MC characterization becomes an equivalent representation to the usual functional description of the posterior pdf, and the SIS filter approaches the optimal Bayesian estimate.

framework

Let $\left{\mathbf{x}{0 : k}^{i}, w{k}^{i}\right}{i=1}^{N{s}}$ Denote a random measure that characterizes the posterior pdf $p\left(\mathbf{x}{0 : k} | \mathbf{z}{1 : k}\right)$. Then $p\left(\mathbf{x}{0 : k} | \mathbf{z}{1 : k}\right) \approx \sum{i=1}^{N{s}} w{k}^{i} \delta\left(\mathbf{x}{0 : k}-\mathbf{x}_{0 : k}^{i}\right)$.

  • The weights are chosen using the principle of importance sampling
    • draw $x{0:k}^i$ importance density $q(x{0:k} | z_{1:k})$ by augmenting existing states 0:k-1 with a new state k.
    • calculate weights $w{k}^{i} \propto \frac{p\left(\mathbf{x}{0 : k}^{i} | \mathbf{z}{1 : k}\right)}{q\left(\mathbf{x}{0 : k}^{i} | \mathbf{z}_{1 : k}\right)}$ by $wk^i \propto w{k-1}^{i} \frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right) p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}\right)}{q\left(\mathbf{x}{k}^{i} | \mathbf{x}{0 : k-1}^{i}, \mathbf{z}{1 : k}\right)} = w{k-1}^{i} \frac{p\left(\mathbf{z}{k} | \mathbf{x}{k}^{i}\right) p\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}\right)}{q\left(\mathbf{x}{k}^{i} | \mathbf{x}{k-1}^{i}, \mathbf{z}_{k}\right)}$ by Markov Chain
importance sampling
  • proposal (importance) density: $q(x)$
  • draw $x^i \sim q(x)$
  • Then $p(x) \approx \sum{i=1}^{N{s}} w^{i} \delta\left(x-x^{i}\right)$, where $w^{i} \propto \frac{\pi\left(x^{i}\right)}{q\left(x^{i}\right)}$
Discussion
Degeneracy

very common problem with SIS PF, where after a few iterations, all but one particles will have negligible weight. The variance of the importance weights can only increase over time.

It implies that a large computational effort is devoted to updating particles whose attribution is almost zero.

A suitable measure of degeneracy is estimate of Effective Sample Size $\widehat{N{e f f}}=\frac{1}{\sum{i=1}^{N{s}}\left(w{k}^{i}\right)^{2}}$ .

Note that $\widehat{N_{eff}} \le N_s$ and smaller indicates severer degeneracy.

using a large $N_s$ to solve degeneracy is often impractical, therefore, we rely on 2 methods:

  • good choice of importance density
  • use of resampling.
Choice of Importance Density

to minimize $Var(wk^i)$, so that $N{eff}$ is maximized.

2 cases when use of the optimal importance density is possible:

  • $x_k$ is a member of a finite set.
  • $p(xk | x{k-1}^i, zk)$ is Gaussian. This can occur if dynamics are nonlinear, but measurements are linear, and $v{k-1}, n_k$ are mutually independent iid Gaussian. Analytic evaluation is possible.

For many other models, it’s possible to construct suboptimal approximation by usign local lienarization techniques.

Or use the unscented transform to estiamte Gaussian approximation to $p(xk | x{k-1}, z_k)$. The additional computational cost is often more than offset by a reduction in the number of samples required to achieve a certain level of performance.

It’s often convent to choose $q\left(\mathbf{x}{k} | \mathbf{x}{k-1}^{i}, \mathbf{z}{k}\right)=p\left(\mathbf{x}{k} | \mathbf{x}{k-1}^{i}\right)$, which will yield $w{k}^{i} \propto w{k-1}^{i} p\left(\mathbf{z}{k} | \mathbf{x}_{k}^{i}\right)$. Intuitive and simple to implement.

Resampling

When a significant degeneracy is observed.

Idea is to eliminate particles that have small weights and to concentrate on particles with large weights.

Generating a new set $\left{\mathbf{x}{k}^{i *}\right}{i=1}^{N_{s}}$ by resampling with replacement $Ns$ times from $p\left(\mathbf{x}{k} | \mathbf{z}{1 : k}\right) \approx \sum{i=1}^{N{s}} w{k}^{i} \delta\left(\mathbf{x}{k}-\mathbf{x}{k}^{i}\right)$ so that $\operatorname{Pr}\left(\mathbf{x}{k}^{i *}=\mathbf{x}{k}^{j}\right)=w_{k}^{j}$.

Problems
  • limits the opportunity to parallelize since all the particles must be combined.
  • particles that have high weights are statistically selected many times. Loss of diversity as the resultant sample will contain many repeated points. [Sample Impoverishment]
  • smoothed estimates based on the particles’ path degenerate

Other particle filter

the other PF can be regarded as special cases of general SIS PF, and can be derived by appropriate choice of importance sampling density and/or modification of the resampling step.

  • Sampling importance resampling (SIR)
  • Auxiliary sampling importance resampling (ASIR) filter
  • Regularized Particle Filter (PRF)

omitted

Example

omitted

Discussion

  • If the assumptions of Kalman filter or grid-based filter hold, then no other algorithm can outperform them. The assumptions do not hold in a variety of real scenarios.
  • EKF approximates the probability density by Gaussian.
  • Approximate grid-based filters approximates the continuous state space as a set of discete regions, and become computationally expensive when dealing with high-dimensional state space.
  • Particle Filtering approximates the density directly as a finite number of MC samples. The choice of importance density is critical

Reference

  • M. S. Arulampalam, S. Maskell, N. Gordon and T. Clapp, “A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” in IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174-188, Feb. 2002. doi: 10.110978.978374
  • https://en.wikipedia.org/wiki/Dirac_measure

Thoughts

Previous
Next