Title
An Introduction to Partial Least Square Regression
Authors
Randall D. Tobias, SAS Institute Inc., Cary, NC
Summary
Partial Least Square(PLS) is a popular method for “soft” modelling in industrial application.
Introduction
Multiple linear regression(MLR) can be a good way to turn data into information when:
- Factors are few in number.
- Factors are not significantly redundant(collinear).
- Factors have a well-understood relationship to the response.
However, if any of these 3 conditions breaks down, MLR can be inefficient or inappropriate.
PLS is a method for constructing predictive models when factors are many and highly collinear. The emphasis is on predicting the responses and NOT necessarily on trying to understand the underlying relationship between variables. e.g. PLS is NOT usually appropriate to screening out factors,
How does PLS work?
MLR will overfit if the number of factors gets greater than the number of observations: get perfectly fit but fail to predict new data well.
“projection to latent structure” (“PLS”):
- Although there are many manifest factors, there may be only a few underlying or latent factors.
- The general idea of PLS is to try to extract these latent factors, accounting for as much of the manifest factor variation as possible while modeling the responses well.
Outline of the method:
- overall goal: use factors to predict the responses.
- Extract latent variable $T$ and $U$ from sampled factors and responses.
- $T$ (X-scores) are used to predict $U$ (Y-scores), then construct predictions for the responses.
This procedure actually covers various techniques, depending on which source of variation is considered most crucial.
- Principal Components Regression (PCR):
- X-scores are chosen to explain as much of the factor variation as possible, which yields informative directions in the factor space, but NOT be associated with the shape of the predicted surface.
- based on spectral decomposition of $X’X$.
- Maximum Redundancy Analysis (MRA):
- Y-scores are chosen to explain as much of the predicted Y variation as possible, which seeks direction in the factor space that are associated with the most variation in the responses, but the predictions may not be very accurate.
- based on spectral decomposition of $\hat{Y}’\hat{Y}$.
- Partial Least Squares:
- The X- and Y-scores are chosen so that the relationship between successive pairs of scores is as strong as possible. This is like a robust form of redundancy analysis, seeking directions in the factor space that are associated with high variation in the responses but biasing them toward directions that are accurately predicted.
- based on singular value decomposition of $X’Y$.
the precise number of extracted factors:
- If greater or equal to the rank of the sample factor space, then equivalent to MLR.
- usually a great deal fewer factors are required.
- usually chosen by some heuristic technique based on the amount of residual variation.
- (possible approach) construct PLS model for a given number of factors on training set of data and test it on another, choosing the number for which the total prediction error is minimized. (Minimum predicted residual sum of squares (PRESS)).
- (Pharsimony) choosing the least number of extracted factors whose residuals are not significantly greater than those with minimum error.
- If no test set available, do cross-validation.
Example
Discussion
other solutions for “soft” model:
- other factor extraction techniques, like PCR and MRA.
- ridge regression (RR)
- neural networks (NN)
RR and NN are probably the strongest competitors for PLS in terms of flexibility and robustness of the predictive models, but neither of them explicitly incorporates dimension reduction.
some other modifications and extensions of PLS:
- SIMPLS: it can be dramatically more efficient to compute when there are many factors.
- Continuum regression: adds a continuous parameter $\alpha \in [0, 1]$, allowing the modeling method to vary continuously between MLR ($\alpha = 0$), PLS ($\alpha = 0.5$), and PCR ($\alpha = 1$).
PLS has become an established tool in chemometric modeling, because it is often possible to interpret the extracted factors in terms of the underlying physical system. (derive “hard” modeling information from the soft model.)
Reference
- Tobias, Randall D.. “An Introduction to Partial Least Squares Regression.” (1996).