Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is "why?" Why can a procedure that is principally designed for over-determined regression problems locate and emphasize group structure? Using PLS in this manner has heuristic support owing to the relationship between PLS and canonical correlations analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This dissertation replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over principal components analysis (PCA) when discrimination is the goal and dimension reduction is needed.