- In regression LFM’s (12.35) the factors are observable and loadings are constructed to maximize the r-squared (12.35)-(12.36).
- Regression LFM work better when there is high correlation among target variables and factors and low correlation among factors (12.46).
- The regression prediction (12.44) has dual geometrical interpretation as best linear prediction (12.70) and orthogonal projection (12.71) of the target variables onto the linear span of factors.
The first of the three main classes of dominant-residual LFM’s (12.12) are regression LFM’s. In regression LFM’s, the factors are given exogenously, and the loadings are constructed.
The purpose of regression models is explanatory: given a large or small (even one) number of target variables , the goal is to explain as much as possible of the randomness in in terms of a few () observable factors .
Regression LFM’s are also known as “macroeconomic” LFM’s, because in some applications the factors are macroeconomic variables, such as interest rates, stock market returns, etc.
A regression , or macroeconomic linear factor model (LFM), for an -dimensional target variable is a dominant-residual decomposition (12.12)
In (12.35) the target variables and factors are observable. In particular, we assume known the equivalence class of their joint distribution (12.7), and thus their joint expectations and covariances. Then, the loadings matrix is constructed in such a way to maximize the r-squared of the given factors.
More precisely, let us start with:
i) a symmetric and positive-definite matrix that defines the r-squared objective (12.10);
ii) and a number of observable factors.
Then a regression LFM is a dominant-residual LFM (12.13)
where is the Riccati root of (28.413), and the constraints are:
i) the factors are given exogenously for a suitable vector ;
ii) the residuals have zero expectation , which with i) implies .
Therefore, the constraints become
Then the matrix is optimized in (12.36) to yield , which maximizes the r-squared.
Regression LFM’s are arguably the most implemented models for supervised learning because of their intuitive features (Section 12.6.13). Furthermore, regression LFM’s can be mapped from the affine format (12.35) to the equivalent linear format , by considering the constant as additional factor, see Section 14.1.4.
Consider a univariate target variable and one observable factor . Suppose the variables are jointly normal , where
represents, among all possible lines
such that , the one that best fits
the joint distribution represented
by a large set of simulations .
Then, we observe how the area corresponding to the squared errors (red) is smaller than the area corresponding to (blue) for two arbitrary simulations . This is not surprising since the regression parameters are those that materialize on average the least squared errors (12.36).
The regression parameters define the best-fit line in a same way as the expectation and covariance define the best-fit ellipsoid in Example 36.23.
Note that the optimal loadings do not depend on the scale matrix specifying the r-squared (12.10).
Note that all that matters to compute the optimal loadings (12.40) and the optimal shift (12.41) are the first two moments of the joint distribution . Hence, any result holds on an equivalence class of distributions identified by the first two moments (12.7).
The regression prediction (12.3) becomes
This equation shows that the regression prediction lives in an -dimensional subspace, embedded in the -dimensional space of the original target variable and factors , see Figure 12.5.
where is the covariance matrix of target variables and factors, and is the covariance matrix of the factor.
Since a model that fits well the target variable must have a high r-squared, i.e. , the factors should be as uncorrelated among each other as possible. If there are high correlations among the factors, or if there is high collinearity, the matrix would be ill-defined [W] and possibly become singular. Geometrically, this means that the space of the prediction (12.44) is not properly identified, see Figure 12.5.
Consider a univariate target variable and two observable factors . Suppose the variables are jointly normal , where
In Figure 12.5 we compare the target variable
and the corresponding
regression prediction (12.44),
which is the projection of
onto the regression plane. More precisely
-) In the left plot we show the simulations of , generated by varying the entries of the correlation matrix , along with the regression plane
-) In the right plot we show the corresponding projected simulations of the prediction .
Notice that when the factors are collinear, i.e. , the predicted simulations (green dots) shrink along a line, which is a one-dimensional subspace of the regression plane (see the right plot). This means that any two-dimensional regression plane passing through that line is admissible (see the left plot), and so the space of the prediction is not properly identified.
Furthermore, the r-squared expression (12.46) supports the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . If we set in the r-squared (12.46), we obtain that the r-squared is the squared correlation between the factor and the target variable
From the regression prediction (12.44) we can compute the residuals , or
The residuals are uncorrelated with the factors E.12.3
and thus the factors are systematic (12.16). As a matter of fact the optimal matrix of regression loadings (12.40) is, among all the matrices , the only one which makes the residuals uncorrelated with the factors . See Section 12.6.5 for more on this profound result.
Finally, the explicit expression of the covariance of the residuals reads E.12.4
Hence, the residuals are not uncorrelated with each other
Similar to the r-squared (12.46), the covariance expression (12.52) supports again the intuition that a “good” model must display high overall correlations between the target variable and factors . Indeed, consider a simple case with one univariate target variable and one factor which are jointly normal with correlation . Then, the residual variance, which reads E.12.14
is minimal when the r-squared (12.48) is maximal, or .
Example 12.9. We continue from Example 12.8. The residual and the factors are jointly normal , where
The linear regression (12.35) allow us to introduce a projection operator , which can be thought of as the linear counterpart of the operation of conditioning; and two ensuing summary statistics, which can be thought of as the linear counterpart of the conditional distribution (32.48).
Just as the conditional distribution represents all that can be inferred about from i) knowledge of and ii) the joint distribution , the two summary statistics and that we introduce below represent all that can be inferred about from i) knowledge of and ii) the first two moments (12.7) of the joint distribution , as we summarize in the below.
The above table is of capital importance to define the two and only two approaches currently used in statistics and finance.
To define the first summary statistic, let us interpret the linear regression prediction (12.70) as () linear projection
The nomenclature “linear projection” is justified because the linear prediction (12.70) can be interpreted geometrically at the same time as best prediction (28.190) or, equivalently, as orthogonal projection (28.172), as we will show later in (12.71).
To introduce the second summary statistic, let us interpret the residual in linear regression (12.35) as weak innovation
where the nomenclature “weak” is due to being uncorrelated with (12.76), as opposed to being independent of (“strong” innovation).
The linear loss matrix (12.60) depends on the third order moments of the variables E.40.51 and thus we cannot know it with the only knowledge of the first two moments (12.7). Furthermore, does not identify a symmetric and positive definite matrix in general E.40.51 , unless very special cases E.40.53 . For this reason, the linear loss matrix is not a good definition of second summary statistic, like the linear projection (12.57).
However, we can define the (linear, ) error prediction matrix, or partial covariance matrix as the expectation of the loss matrix projection (12.60)
which also explicitly reads E.40.51
Note how, unlike the linear loss matrix (12.60), the partial covariance matrix (12.61) depends only on the second order moments of the variables , which are known (12.7), and is clearly symmetric and positive definite.
By construction, the partial covariance (12.61) of the target variable evaluated at a specific realization is constant
so that we can use interchangeably notations and .
We can extend the definition of partial covariance matrix (12.61) to pairs of target random vectors and with dimensions , respectively by means of the cross-partial covariance matrix
We say that are partially orthogonal with respect to if their residuals are orthogonal, or equivalently, if their cross-partial covariance matrix (12.65) is zero