### 25.2 Regression LFM’s

Key points

• In regression LFM’s (25.46) the factors are observable and loadings are constructed to maximize the r-squared (25.46)-(25.47).
• Regression LFM work better when there is high correlation among target variables and factors and low correlation among factors (25.60).
• The regression prediction (25.58) has dual geometrical interpretation as best linear prediction (25.87) and orthogonal projection (25.88) of the target variables onto the linear span of factors.

The first of the three main classes of dominant-residual LFM’s (25.23) are regression LFM’s. Regression LFM’s are also known as “macroeconomic” LFM’s, because in some applications the factors are macroeconomic variables, such as interest rates, stock market returns, etc. The purpose of regression models is explanatory: we want to explain a given large or small (even one) number () of target variables , such as some S&P500 stock returns, as much as possible in terms of a large or small number () of observable factors , such as one or more total return indexes. See .

Regression models can be either useful to perform dimension reduction (), e.g. to draw Monte Carlo scenarios of large dimensional variables (see Section 18.7.6); or risk attribution (), e.g. to compute the optimal hedge or attribute the risk in the portfolio P&L () to a few key drivers () (seeStep 8a).

Beyond the financial industry, regression models are extensively implemented in many mathematical fields, most notably supervised learning (Chapters 27-28), as shown in Figure 24.1, and they became very popular in the statistical literature because of their appealing mathematical features, see Section 25.6.13.

For example, it is a common practice to use regression models for prediction and forecasting (Section 26.2), say to understand simple linear casual relationships between the target variables (or dependent variables) , such as the tomorrow’s temperature in London, and the factors (or independent variables) , such as some current measurements in England (average temperature, atmospheric pressure, humidity etc) [W].

#### 25.2.1 Definition

A regression , or macroeconomic linear factor model (LFM), for an -dimensional target variable is a dominant-residual decomposition (25.23)

 X=α+β×Z+˚εobservableconstructedconstructedobservableconstructed (25.46)

In (25.46) the target variables and factors are observable. In particular, we assume known the mean-covariance equivalence class of their joint distribution (25.13), and thus their joint expectations and covariances. Then, the loadings matrix is constructed in such a way to maximize the r-squared of the given factors.

i) a symmetric and positive-definite matrix that defines the r-squared objective (25.21);

ii) and a number of observable factors.

Then a regression LFM is a dominant-residual LFM (25.24)

where is the Riccati root of (14.440), and the constraints are:

i) the factors are given exogenously for a suitable vector ;

ii) the residuals have zero expectation , which with i) implies .

Therefore, the constraints become

 C≡⎧⎪⎨⎪⎩(a,b,F):b freeF=Za=E{X}−bE{Z}⎫⎪⎬⎪⎭. (25.48)

Then the matrix is optimized in (25.47) to yield , which maximizes the r-squared.

In the estimation context the dominant-residual regression framework (25.47) becomes the ordinary least squares optimization (25.112).

Regression LFM’s are arguably the most implemented models for supervised learning because of their intuitive features (Section 25.6.13). Furthermore, regression LFM’s can be mapped from the affine format (25.46) to the equivalent linear format , by considering the constant as additional factor, see Section 27.1.4.

In Figure 25.5 we show intuition and pitfalls behind linear ordinary least squares regression.
Consider a univariate target variable , such as one stock’s compounded return, and one observable factor , such as the compounded return of a different stock. To illustrate, suppose that the variables are jointly bivariate normal (18.94)

 (XZ)∼N((μXμZ),(σ2XϱX,ZσXσZϱX,ZσXσZσ2Z)). (25.49)

We show a large number of joint scenarios and the respective marginal histograms. We also draw the least squares regression line

 x=E{X}−E{Z}Cv{X,Z}V{Z}α+Cv{X,Z}V{Z}βz, (25.50)

together with an arbitrary line . The regression line (25.50) is the one that best fits the scenarios among all the possible lines .
Indeed, the area coresponding to the squared errors (yellow) is smaller than the area corresponding to (blue).
Once we have fitted the regression line to the distribution (25.49), the prediction (25.15) is a random variable that takes values on the line (25.50). We can then use the prediction to postulate, or observe, a value for the factor and infer the predicted mean for the target variable.
We can then change the distribution (25.49), for instance by varying the correlation . As the distribution changes, the regression line (25.50) adapts, and maintains the best least square fit (25.47).
To better analyse the fit, we focus on the joint distribution of the prediction and the target variable , and the respective r-square (25.21). As the joint distribution of the target and the factor varies, the prediction is always positively correlated with the target variable.
To better understand the residual , we show the joint distribution of the prediction and the residual . As the fit, or r-square (25.21), increases, the residual decreases. Regardless, the prediction is always uncorrelated with the residual, and therefore the mean-covariance ellipse (21.71) has its principal axes parallel to the reference axes (21.85).
These key features are preserved regardless of the distribution of the target and the factor . For instance, consider the case where and are the prices of the above two stocks, and thus they are jointly lognormal (18.198)

 (XZ)∼LogN((μXμZ),(σ2XϱX,ZσXσZϱX,ZσXσZσ2Z)). (25.51)

Then again the least squares regression line (25.50) is the one that, among all the possible lines, achieves the best fit in terms of the square residuals (25.47).
As the first two moments of the distribution (25.51) change, the prediction (25.15) provides a linear approximation to a possibly nonlinear relationship between target and factor.
However, the prediction remains positively correlated with the target , and it remains uncorrelated with the residual , and the fit improves as the residual decreases.
The linear least squares regression works also with discrete variables. To illustrate, consider the mixture case (18.514), where the target is binary: if the stock goes up and if it goes down (18.378)

 X∼Bernoulli(p) (25.52)

and the factor is conditionally normal (18.519)

 Z|x∼N(μx,σ2x). (25.53)

The regression line (25.50) allows us to predict the mean of the target variable, which is the probability of the stock going up. Again, the prediction (25.15) is positively correlated with the target , and uncorrelated with the residual .

Given the constraints (25.48), the key variables to solve in the regression LFM optimization (25.47) are the loadings, which can be computed analytically E.25.1

 β=Cv{X,Z}(Cv{Z})−1. (25.54)

Note that the optimal loadings do not depend on the scale matrix specifying the r-squared (25.21).

Given the optimal loadings (25.54), we obtain from the zero expectation constraint (25.48)

 α=E{X}−Cv{X,Z}(Cv{Z})−1E{Z}. (25.55)

Note that all that matters to compute the optimal loadings (25.54) and the optimal shift (25.55) are the first two moments of the joint distribution . Hence, any result holds on their mean-covariance equivalence class (25.13).

Example 25.9. Consider a univariate target variable () with two observable factors (), which are jointly normal , where

 μX,Z≡⎛⎜⎝333⎞⎟⎠,σ2X,Z≡⎛⎜⎝10.210.350.2140.60.350.61⎞⎟⎠. (25.56)

 α=1.95,β=(00.35). (25.57)

#### 25.2.3 Prediction and fit

The regression prediction (25.15) becomes

 ¯¯¯¯XReg=E{X}+Cv{X,Z}(Cv{Z})−1(Z−E{Z}). (25.58)

This equation shows that the regression prediction lives in an -dimensional subspace, embedded in the -dimensional space of the original target variable and factors , see Figure 25.6.

Example 25.10. We continue from Example 25.9. The regression prediction (25.58) is normally distributed , where S.25.2

 μ¯¯¯XReg=3,σ2¯¯¯XReg=0.12. (25.59)

The r-squared (25.21) provided by the prediction (25.58) reads E.25.2

 R2σ2{¯¯¯¯XReg∥X}=tr[Cv{σ−1X,Z}Cv{Z}−1Cv{Z,σ−1X}]tr(Cv{σ−1X}) (25.60)

where is the covariance matrix of target variables and factors, and is the covariance matrix of the factor.

Since a model that fits well the target variable must have a high r-squared, i.e. , the factors should be as uncorrelated among each other as possible. If there are high correlations among the factors, or if there is high collinearity, the matrix would be ill-defined [W] and possibly become singular. Geometrically, this means that the space of the prediction (25.58) is not properly identified, see Figure 25.6.

Example 25.11. Regression linear factor model for different correlations

Consider a univariate target variable and two observable factors . Suppose the variables are jointly normal , where

 μX,Z≡⎛⎜⎝μXμZ1μZ2⎞⎟⎠,σ2X,Z≡⎛⎜ ⎜ ⎜⎝σ2XσX,Z1σX,Z2σZ1,Xσ2Z1σZ1,Z2σZ2,XσZ2,Z1σ2Z2⎞⎟ ⎟ ⎟⎠. (25.61)

In Figure 25.6 we compare the target variable and the corresponding regression prediction (25.58), which is the projection of onto the regression plane. More precisely
-) In the left plot we show the simulations of , generated by varying the entries of the correlation matrix , along with the regression plane
-) In the right plot we show the corresponding projected simulations of the prediction .
Notice that when the factors are collinear, i.e. , the predicted simulations (green dots) shrink along a line, which is a one-dimensional subspace of the regression plane (see the right plot). This means that any two-dimensional regression plane passing through that line is admissible (see the left plot), and so the space of the prediction is not properly identified.

Through the explicit expression of the r-squared (25.60) we can determine the best pool of observable factors, see Section 39.4.1.

Furthermore, the r-squared expression (25.60) supports the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . If we set in the r-squared (25.60), we obtain that the r-squared is the squared correlation between the factor and the target variable

 R2σ2{¯¯¯XReg∥X}=ϱ2X,Z. (25.62)

Example 25.12. We continue from Example 25.10. The r-squared (25.60) reads S.25.2

 R2σ2{¯¯¯XReg∥X}=0.12. (25.63)

#### 25.2.4 Residuals features

From the regression prediction (25.58) we can compute the residuals , or

 ˚ε=X−E{X}−Cv{X,Z}(Cv{Z})−1(Z−E{Z}). (25.64)

The residuals are uncorrelated with the factors E.25.3

 Cv{˚εn,Zk}=0, (25.65)

and thus the factors are systematic (25.27). As a matter of fact the optimal matrix of regression loadings (25.54) is, among all the matrices , the only one which makes the residuals uncorrelated with the factors . See Section 25.6.5 for more on this profound result.

Finally, the explicit expression of the covariance of the residuals reads E.25.4

 Cv{˚ε}=Cv{X}−Cv{X,Z}Cv{Z}−1Cv{Z,X}. (25.66)

Hence, the residuals are not uncorrelated with each other

 Cv{˚εn,˚εm}≠0, (25.67)

and thus they are not idiosyncratic (25.28). We discuss this pitfall further in Section 25.6.5.

Similar to the r-squared (25.60), the covariance expression (25.66) supports again the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . Then, the residual variance, which reads E.25.14

 V{˚ε}=σ2X(1−ϱ2X,Z), (25.68)

is minimal when the r-squared (25.62) is maximal, or .

Example 25.13. We continue from Example 25.12. The residual and the factors are jointly normal , where

 μ˚ε,Z=⎛⎜⎝033⎞⎟⎠,σ2˚ε,Z=⎛⎜⎝0.8800040.6000.601⎞⎟⎠. (25.69)

Notice that the factors is systematic (25.65). S.25.2

#### 25.2.5 Natural scatter specification

So far we have left unspecified the scale matrix that defines the r-squared (25.21) in the regression LFM optimization (25.47)-(25.48).

For any choice of we have always the same loadings (25.54) and hence same dominant-residual decompositions (25.23)-(25.24). For this reason the natural choice for is simply the identity

 σ2≡I¯n. (25.70)

#### 25.2.6 The ∥ projection operator

The linear regression (25.46) allow us to introduce a projection operator , which can be thought of as the linear counterpart of the operation of conditioning; and two ensuing summary statistics, which can be thought of as the linear counterpart of the conditional distribution (18.54).

Just as the conditional distribution represents all that can be inferred about from i) knowledge of and ii) the joint distribution , the two summary statistics and that we introduce below represent all that can be inferred about from i) knowledge of and ii) the mean-covariance equivalence class of the joint distribution (25.13), as we summarize in the below.