Paper 7

Title

An Introduction to Partial Least Square Regression

Download pdf

Authors

Randall D. Tobias, SAS Institute Inc., Cary, NC

Summary

Partial Least Square(PLS) is a popular method for “soft” modelling in industrial application.

Introduction

Multiple linear regression(MLR) can be a good way to turn data into information when:

Factors are few in number.
Factors are not significantly redundant(collinear).
Factors have a well-understood relationship to the response.

However, if any of these 3 conditions breaks down, MLR can be inefficient or inappropriate.

PLS is a method for constructing predictive models when factors are many and highly collinear. The emphasis is on predicting the responses and NOT necessarily on trying to understand the underlying relationship between variables. e.g. PLS is NOT usually appropriate to screening out factors,

How does PLS work?

MLR will overfit if the number of factors gets greater than the number of observations: get perfectly fit but fail to predict new data well.

“projection to latent structure” (“PLS”):

Although there are many manifest factors, there may be only a few underlying or latent factors.
The general idea of PLS is to try to extract these latent factors, accounting for as much of the manifest factor variation as possible while modeling the responses well.

Outline of the method:

overall goal: use factors to predict the responses.
Extract latent variable $T$ and $U$ from sampled factors and responses.
$T$ (X-scores) are used to predict $U$ (Y-scores), then construct predictions for the responses.

This procedure actually covers various techniques, depending on which source of variation is considered most crucial.

Principal Components Regression (PCR):
- X-scores are chosen to explain as much of the factor variation as possible, which yields informative directions in the factor space, but NOT be associated with the shape of the predicted surface.
- based on spectral decomposition of $X’X$.
Maximum Redundancy Analysis (MRA):
- Y-scores are chosen to explain as much of the predicted Y variation as possible, which seeks direction in the factor space that are associated with the most variation in the responses, but the predictions may not be very accurate.
- based on spectral decomposition of $\hat{Y}’\hat{Y}$.
Partial Least Squares:
- The X- and Y-scores are chosen so that the relationship between successive pairs of scores is as strong as possible. This is like a robust form of redundancy analysis, seeking directions in the factor space that are associated with high variation in the responses but biasing them toward directions that are accurately predicted.
- based on singular value decomposition of $X’Y$.

the precise number of extracted factors:

If greater or equal to the rank of the sample factor space, then equivalent to MLR.
usually a great deal fewer factors are required.
usually chosen by some heuristic technique based on the amount of residual variation.
(possible approach) construct PLS model for a given number of factors on training set of data and test it on another, choosing the number for which the total prediction error is minimized. (Minimum predicted residual sum of squares (PRESS)).
(Pharsimony) choosing the least number of extracted factors whose residuals are not significantly greater than those with minimum error.
If no test set available, do cross-validation.

Example

Discussion

Reference

Tobias, Randall D.. “An Introduction to Partial Least Squares Regression.” (1996).

Thoughts

Last updated on Jul 31, 2019