### 26.2 Regression LFM’s

Key points

• In regression LFM’s (26.36) the factors are observable and loadings are constructed to maximize the r-squared (26.36)-(26.37).
• Regression LFM work better when there is high correlation among target variables and factors and low correlation among factors (26.47).
• The regression prediction (26.45) has dual geometrical interpretation as best linear prediction (26.73) and orthogonal projection (26.74) of the target variables onto the linear span of factors.

The first of the three main classes of dominant-residual LFM’s (26.13) are regression LFM’s. In regression LFM’s, the factors are given exogenously, and the loadings are constructed.

The purpose of regression models is explanatory: given a large or small (even one) number of target variables , the goal is to explain as much as possible of the randomness in in terms of a few () observable factors .

Regression LFM’s are also known as “macroeconomic” LFM’s, because in some applications the factors are macroeconomic variables, such as interest rates, stock market returns, etc.

Finally, regression LFM’s naturally generalize to supervised learning models (Sections 28.1-28.3), as shown in Figure 25.1, and became very popular in the statistical literature because of many other appealing features, see Section 26.6.13.

#### 26.2.1 Definition

A regression , or macroeconomic linear factor model (LFM), for an -dimensional target variable is a dominant-residual decomposition (26.13)

 X=α+β×Z+˚εobservableconstructedconstructedobservableconstructed (26.36)

In (26.36) the target variables and factors are observable. In particular, we assume known the equivalence class of their joint distribution (26.7), and thus their joint expectations and covariances. Then, the loadings matrix is constructed in such a way to maximize the r-squared of the given factors.

i) a symmetric and positive-definite matrix that defines the r-squared objective (26.11);

ii) and a number of observable factors.

Then a regression LFM is a dominant-residual LFM (26.14)

where is the Riccati root of (15.440), and the constraints are:

i) the factors are given exogenously for a suitable vector ;

ii) the residuals have zero expectation , which with i) implies .

Therefore, the constraints become

 C≡⎧⎪⎨⎪⎩(a,b,F):b freeF=Za=E{X}−bE{Z}⎫⎪⎬⎪⎭. (26.38)

Then the matrix is optimized in (26.37) to yield , which maximizes the r-squared.

In the estimation context the dominant-residual regression framework (26.37) becomes the ordinary least squares optimization (26.98).

Regression LFM’s are arguably the most implemented models for supervised learning because of their intuitive features (Section 26.6.13). Furthermore, regression LFM’s can be mapped from the affine format (26.36) to the equivalent linear format , by considering the constant as additional factor, see Section 28.1.4.

Consider a univariate target variable and one observable factor . Suppose the variables are jointly normal , where

 (XZ)∼N((μXμZ),(σ2XϱX,ZσXσZϱX,ZσXσZσ2Z)). (26.39)

In Figure 26.4, we show how the regression line, as specified by the regression parameters in (26.37)

 x=μX−μZσXρX,ZσZα+σXϱX,ZσZβz, (26.40)

represents, among all possible lines such that , the one that best fits the joint distribution represented by a large set of simulations .
Then, we observe how the area corresponding to the squared errors (red) is smaller than the area corresponding to (blue) for two arbitrary simulations . This is not surprising since the regression parameters are those that materialize on average the least squared errors (26.37).
The regression parameters define the best-fit line in a same way as the expectation and covariance define the best-fit ellipsoid in Example 22.24.

Given the constraints (26.38), the key variables to solve in the regression LFM optimization (26.37) are the loadings, which can be computed analytically E.26.1

 β=Cv{X,Z}(Cv{Z})−1. (26.41)

Note that the optimal loadings do not depend on the scale matrix specifying the r-squared (26.11).

Given the optimal loadings (26.41), we obtain from the zero expectation constraint (26.38)

 α=E{X}−Cv{X,Z}(Cv{Z})−1E{Z}. (26.42)

Note that all that matters to compute the optimal loadings (26.41) and the optimal shift (26.42) are the first two moments of the joint distribution . Hence, any result holds on an equivalence class of distributions identified by the first two moments (26.7).

Example 26.5. Consider a univariate target variable () with two observable factors (), which are jointly normal , where

 μX,Z≡⎛⎜⎝333⎞⎟⎠,σ2X,Z≡⎛⎜⎝10.210.350.2140.60.350.61⎞⎟⎠. (26.43)

 α=1.95,β=(00.35). (26.44)

#### 26.2.3 Prediction and fit

The regression prediction (26.3) becomes

 ¯¯¯¯XReg=E{X}+Cv{X,Z}(Cv{Z})−1(Z−E{Z}). (26.45)

This equation shows that the regression prediction lives in an -dimensional subspace, embedded in the -dimensional space of the original target variable and factors , see Figure 26.5.

Example 26.6. We continue from Example 26.5. The regression prediction (26.45) is normally distributed , where S.26.2

 μ¯¯¯XReg=3,σ2¯¯¯XReg=0.12. (26.46)

The r-squared (26.11) provided by the prediction (26.45) reads E.26.2

 R2σ2{¯¯¯¯XReg∥X}=tr[Cv{σ−1X,Z}Cv{Z}−1Cv{Z,σ−1X}]tr(Cv{σ−1X}) (26.47)

where is the covariance matrix of target variables and factors, and is the covariance matrix of the factor.

Since a model that fits well the target variable must have a high r-squared, i.e. , the factors should be as uncorrelated among each other as possible. If there are high correlations among the factors, or if there is high collinearity, the matrix would be ill-defined [W] and possibly become singular. Geometrically, this means that the space of the prediction (26.45) is not properly identified, see Figure 26.5.

Example 26.7. Regression linear factor model for different correlations

Consider a univariate target variable and two observable factors . Suppose the variables are jointly normal , where

 μX,Z≡⎛⎜⎝μXμZ1μZ2⎞⎟⎠,σ2X,Z≡⎛⎜ ⎜ ⎜⎝σ2XσX,Z1σX,Z2σZ1,Xσ2Z1σZ1,Z2σZ2,XσZ2,Z1σ2Z2⎞⎟ ⎟ ⎟⎠. (26.48)

In Figure 26.5 we compare the target variable and the corresponding regression prediction (26.45), which is the projection of onto the regression plane. More precisely
-) In the left plot we show the simulations of , generated by varying the entries of the correlation matrix , along with the regression plane
-) In the right plot we show the corresponding projected simulations of the prediction .
Notice that when the factors are collinear, i.e. , the predicted simulations (green dots) shrink along a line, which is a one-dimensional subspace of the regression plane (see the right plot). This means that any two-dimensional regression plane passing through that line is admissible (see the left plot), and so the space of the prediction is not properly identified.

Through the explicit expression of the r-squared (26.47) we can determine the best pool of observable factors, see Section 47.3.1.

Furthermore, the r-squared expression (26.47) supports the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . If we set in the r-squared (26.47), we obtain that the r-squared is the squared correlation between the factor and the target variable

 R2σ2{¯¯¯XReg∥X}=ϱ2X,Z. (26.49)

Example 26.8. We continue from Example 26.6. The r-squared (26.47) reads S.26.2

 R2σ2{¯¯¯XReg∥X}=0.12. (26.50)

#### 26.2.4 Residuals features

From the regression prediction (26.45) we can compute the residuals , or

 ˚ε=X−E{X}−Cv{X,Z}(Cv{Z})−1(Z−E{Z}). (26.51)

The residuals are uncorrelated with the factors E.26.3

 Cv{˚εn,Zk}=0, (26.52)

and thus the factors are systematic (26.17). As a matter of fact the optimal matrix of regression loadings (26.41) is, among all the matrices , the only one which makes the residuals uncorrelated with the factors . See Section 26.6.5 for more on this profound result.

Finally, the explicit expression of the covariance of the residuals reads E.26.4

 Cv{˚ε}=Cv{X}−Cv{X,Z}Cv{Z}−1Cv{Z,X}. (26.53)

Hence, the residuals are not uncorrelated with each other

 Cv{˚εn,˚εm}≠0, (26.54)

and thus they are not idiosyncratic (26.18). We discuss this pitfall further in Section 26.6.5.

Similar to the r-squared (26.47), the covariance expression (26.53) supports again the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . Then, the residual variance, which reads E.26.14

 V{˚ε}=σ2X(1−ϱ2X,Z), (26.55)

is minimal when the r-squared (26.49) is maximal, or .

Example 26.9. We continue from Example 26.8. The residual and the factors are jointly normal , where

 μ˚ε,Z=⎛⎜⎝033⎞⎟⎠,σ2˚ε,Z=⎛⎜⎝0.8800040.6000.601⎞⎟⎠. (26.56)

Notice that the factors is systematic (26.52). S.26.2

#### 26.2.5 Natural scatter specification

So far we have left unspecified the scale matrix that defines the r-squared (26.11) in the regression LFM optimization (26.37)-(26.38).

For any choice of we have always the same loadings (26.41) and hence same dominant-residual decompositions (26.13)-(26.14). For this reason the natural choice for is simply the identity

 σ2≡I¯n. (26.57)

#### 26.2.6 The ∥ projection operator

The linear regression (26.36) allow us to introduce a projection operator , which can be thought of as the linear counterpart of the operation of conditioning; and two ensuing summary statistics, which can be thought of as the linear counterpart of the conditional distribution (19.54).

Just as the conditional distribution represents all that can be inferred about from i) knowledge of and ii) the joint distribution , the two summary statistics and that we introduce below represent all that can be inferred about from i) knowledge of and ii) the first two moments (26.7) of the joint distribution , as we summarize in the below.

The above table is of capital importance to define the two and only two approaches currently used in statistics and finance.

To define the first summary statistic, let us interpret the linear regression prediction (26.73) as () linear projection

 E{X∥Z}≡αX∥Z+βX∥ZZ, (26.58)

where in the notation of the optimal regression coefficients (26.42) and (26.41) we emphasized the dependence on the variables.

The nomenclature “linear projection” is justified because the linear prediction (26.73) can be interpreted geometrically at the same time as best prediction (15.204) or, equivalently, as orthogonal projection (15.186), as we will show later in (26.74).

Similar to the expectation (22.44), the linear projection (26.58) displays affine equivariance E.22.54

 E{c+dX∥Z}=c+dE{X∥Z}, (26.59)

for any conformable vector and matrix .

From the linear projection (26.58) we can define the (point) linear prediction, which is the linear projection (26.58) of the target variable evaluated at a specific realization

 E{X∥z}≡E{X∥Z}|Z=z=αX∥Z+βX∥Zz. (26.60)

To introduce the second summary statistic, let us interpret the residual in linear regression (26.36) as weak innovation

 ˚ε≡X−E{X∥Z}, (26.61)

where the nomenclature “weak” is due to being uncorrelated with (26.79), as opposed to being independent of (“strong” innovation).

We can gain more insight in the residual (26.61) through the () linear loss matrix, which is the linear projection (26.58), or regression projection, applied to the squared cross-residuals

 ˜Cv{X∥Z}≡E{˚ε˚ε'∥Z}. (26.62)

The linear loss matrix (26.62) depends on the third order moments of the variables  E.22.55 and thus we cannot know it with the only knowledge of the first two moments (26.7). Furthermore, does not identify a symmetric and positive definite matrix in general E.22.55 , unless very special cases E.22.57 . For this reason, the linear loss matrix is not a good definition of second summary statistic.

However, we can define the (linear, ) error prediction matrix, or partial covariance matrix as the expectation of the loss matrix projection (26.62)

 Cv{X∥Z}≡E{E{˚ε˚ε'∥Z}}, (26.63)

 Cv{X∥Z}=Cv{X}−βX∥ZCv{Z}β'X∥Z=Cv{˚ε}. (26.64)

Note how, unlike the linear loss matrix (26.62), the error prediction matrix (26.63) depends only on the second order moments of the variables , which are known (26.7), and is clearly symmetric and positive definite.

Furthermore, similar to the covariance (22.45), the error prediction matrix (26.63) displays affine equivariance E.22.54

 Cv{c+dX∥Z}=dCv{X∥Z}d', (26.65)

for any conformable vector and matrix .

By construction, the error prediction (26.63) of the target variable evaluated at a specific realization is constant

 Cv{X∥z}≡Cv{X∥Z