### 8.2 Linear least squares regression models

Key points

• A regression LFM (8.70) for an -dimensional target variable is a dominant-residual decomposition (8.70)-(8.72) with exogenous and observable factors .
• The maximal r-squared (8.93), which represents the predictive power of the fitted model (8.70), is large only when there is high correlation among target variables and factors and low correlation among factors .

The first of the three main classes of dominant-residual LFM’s (8.33) are regression LFM’s. Regression LFM’s are also known as “macroeconomic” LFM’s, because in some applications the factors are macroeconomic variables, such as interest rates, stock market returns, etc. The purpose of regression models is explanatory: we want to explain a given large or small (even one) number () of target variables , such as some S&P500 stock returns, as much as possible in terms of a large or small number () of observable factors , such as one or more total return indexes. See .

Regression models can be either useful to perform dimension reduction (), e.g. to draw Monte Carlo scenarios of large dimensional variables (see Section 7c.2.9); or risk attribution (), e.g. to compute the optimal hedge or attribute the risk in the portfolio P&L () to a few key drivers () (see Chapter 45a).

Beyond the financial industry, regression models are extensively implemented in many mathematical fields, most notably supervised learning (Chapters 12-13), as shown in Figure 10.1, and they became very popular in the statistical literature because of their appealing mathematical features, see Section 8.6.13.

For example, it is a common practice to use regression models for prediction and forecasting (Section 11.2), say to understand simple linear casual relationships between the target variables (or dependent variables) , such as the tomorrow’s temperature in London, and the factors (or independent variables) , such as some current measurements in England (average temperature, atmospheric pressure, humidity etc.) [W].

Example 8.12. Regression model for forecasting
Let us suppose that we are a portfolio manager and we want to perform forecasting of a portfolio of stocks: Aon PLC (AON) and Cablevision Systems Corp (CVC) using the S&P 500 index, similar to Intuition 37.
In this case, the target variable is the portfolio return (43.3) between today and tomorrow

 X≡w1,tnowR1,tnow+1+w2,tnowR2,tnow+1, (8.62)

where the portfolio weights are defined in (43.4); and the factor (45a.1) is the same return on the S&P 500 index (8.2), but between yesterday and today

 Z≡RS&P,tnow. (8.63)

Then we consider a regression linear factor model (8.70) as our predictive model

 X=α+βZ+˚ε, (8.64)

where the model parameters  are obtained via r-squared maximization (8.72).
Suppose that we have also the estimates for their joint expectation vector (2b.17) and covariance matrix (2b.29)

 E{(XZ)}←10−3×(0.430.16),Cv{(XZ)}←10−3×(0.290.210.210.25). (8.65)

We will use the estimates (8.65) as our proxy for the true, unknown parameters.

 β=10−3×0.21Cv{X,Z}10−3×0.25V{Z}=0.83, (8.66)

and the optimal shift (8.86) reads

 α=10−3×0.43E{X}−0.83×10−3×0.16E{Z}=10−3×0.29. (8.67)

Finally, we can use the estimated model (8.64) to perform forecasting: given the today’s realization for the S&P index return (8.63), we forecast the outcome of the portfolio return tomorrow (8.62)

 z=0.37%⇒x=α+βz=0.34%. (8.68)

Then we can compute the r-squared (8.93) achieved by the regression model (8.64)

 R2{¯¯¯XReg∥X}=10−6×0.044(Cv{X,Z})210−3×0.29V{X}×10−3×0.25V{Z}=57%, (8.69)

which is not large, as expected in all market prediction applications. Hence, the forecast (8.68) is imprecise.

Regression models can be suitable or not depending on the (hidden) co-dependence relationships among the variables involved. In particular, regression models, as all the LFM’s in general, are uniquely identified by the mean-covariance-class of the variables involved (8.13), meaning that their only first two moments of are necessary for their fit.

The regression prediction (8.90) has dual geometrical interpretation as best linear prediction (4b.19) and orthogonal projection (4b.20) of the target variables onto the linear span of factors, see Figure 4b.1.

Beyond the ordinary least-squares approach (Section 8.2.6), a proper estimation of regression models, as well as for all linear factor models (Figure 8.4), requires many considerations that we expand in Chapter 25. Furthermore, estimation can (and must) be further improved by embedding in the model fit (8.70)-(8.72) constraints (Section 8.6.17) or penalizations as in factor selection (Section 24.4) or any other regularization technique ( Section 22.10).

The remainder of this section is organized as follows.

In Section 8.2.1 we introduce regression models as solution of an r-squared maximization with specific constraints.

In Section 8.2.2 we show the analytical expression of the optimal regression loadings and shifts.

In Section 8.2.3 we show the regression prediction and the maximal r-squared it achieves.

In Section 8.2.4 we verify the systematic and idiosyncratic features of the regression residuals.

In Section 8.2.5 we show the natural scale matrix defining the r-squared in regression models.

In Section 4b.2 we give a geometrical interpretation of the linear regression prediction via orthogonal projections.

In Section 8.2.6 we discuss the estimation of regression models.

#### 8.2.1 Definition

A regression, or macroeconomic linear factor model (LFM), for an -dimensional target variable is a dominant-residual decomposition (8.33)

 ⎛⎜⎝X1⋅X¯n⎞⎟⎠=⎛⎜⎝α1⋅α¯n⎞⎟⎠+⎛⎜ ⎜⎝β1,1⋯β1,¯k⋅⋅β¯n,1⋯β¯n,¯k⎞⎟ ⎟⎠×⎛⎜⎝Z1⋅Z¯k⎞⎟⎠+⎛⎜⎝˚ε1⋅˚ε¯n⎞⎟⎠observableconstructedconstructedobservableconstructed (8.70)

or in compact matrix notation

 X=α+βZ+˚ε. (8.71)

In (8.70) the target variables and factors are observable. In particular, we assume known the mean-covariance equivalence class of their joint distribution (8.13), and thus their joint expectations and covariances. Then, the loadings matrix is constructed in such a way to maximize the r-squared of the given factors.

i) a symmetric and positive-definite matrix that defines the r-squared objective (8.28);

ii) and a number of observable factors.

Then a regression LFM is a dominant-residual LFM (8.34)

where is the Riccati root of (31.453), and the constraints are:

i) the factors are given exogenously for a suitable vector ;

ii) the residuals have zero expectation , which with i) implies .

Therefore, the constraints become

 C≡⎧⎪⎨⎪⎩(a,b,F):b freeF=Za=E{X}−bE{Z}⎫⎪⎬⎪⎭. (8.73)

Then the matrix is optimized in (8.72) to yield , which maximizes the r-squared.

In the estimation context the dominant-residual regression framework (8.72) becomes the ordinary least squares optimization (8.108).

Regression LFM’s are arguably the most implemented models for supervised learning because of their intuitive features (Section 8.6.13). Furthermore, regression LFM’s can be mapped from the affine format (8.70) to the equivalent linear format , by considering the constant as additional factor, see Section 12.1.4.

Example 8.13. Regression model for forecasting
Let us suppose that we are a portfolio manager and we want to perform forecasting of a portfolio of stocks: Aon PLC (AON) and Cablevision Systems Corp (CVC) using the S&P 500 index, similar to Intuition 37.
In this case, the target variable is the portfolio return (43.3) between today and tomorrow

 X≡w1,tnowR1,tnow+1+w2,tnowR2,tnow+1, (8.74)

where the portfolio weights are defined in (43.4); and the factor (45a.1) is the same return on the S&P 500 index (8.2), but between yesterday and today

 Z≡RS&P,tnow. (8.75)

Then we consider a regression linear factor model (8.70) as our predictive model

 X=α+βZ+˚ε, (8.76)

where the model parameters  are obtained via r-squared maximization (8.72).
Suppose that we have also the estimates for their joint expectation vector (2b.17) and covariance matrix (2b.2956.3

 E{(XZ)}←10−3×(0.430.16),Cv{(XZ)}←10−3×(0.290.210.210.25). (8.77)

We will use the estimates (8.77) as our proxy for the true, unknown parameters.

In Figure 8.5 we show intuition and pitfalls behind linear ordinary least squares regression.
Consider a univariate target variable , such as one stock’s compounded return, and one observable factor , such as the compounded return of a different stock. To illustrate, suppose that the variables are jointly bivariate normal (7c.1)

 (XZ)∼N((μXμZ),(σ2XϱX,ZσXσZϱX,ZσXσZσ2Z)). (8.78)

We show a large number of joint scenarios and the respective marginal histograms. We also draw the least squares regression line

 x=E{X}−E{Z}Cv{X,Z}V{Z}α+Cv{X,Z}V{Z}βz, (8.79)

together with an arbitrary line . The regression line (8.79) is the one that best fits the scenarios among all the possible lines .
Indeed, the area corresponding to the squared errors (yellow) is smaller than the area corresponding to (blue).
Once we have fitted the regression line to the distribution (8.78), the prediction (8.15) is a random variable that takes values on the line (8.79). We can then use the prediction to postulate, or observe, a value for the factor and infer the predicted mean for the target variable.
We can then change the distribution (8.78), for instance by varying the correlation . As the distribution changes, the regression line (8.79) adapts, and maintains the best least square fit (8.72).
To better analyze the fit, we focus on the joint distribution of the prediction and the target variable , and the respective r-square (8.28). As the joint distribution of the target and the factor varies, the prediction is always positively correlated with the target variable.
To better understand the residual , we show the joint distribution of the prediction and the residual . As the fit, or r-square (8.28), increases, the residual decreases. Regardless, the prediction is always uncorrelated with the residual, and therefore the mean-covariance ellipse (2b.59) has its principal axes parallel to the reference axes (2b.71).
These key features are preserved regardless of the distribution of the target and the factor . For instance, consider the case where and are the prices of the above two stocks, and thus they are jointly lognormal (7c.461)

 (XZ)∼LogN((μXμZ),(σ2XϱX,ZσXσZϱX,ZσXσZσ2Z)). (8.80)

Then again the least squares regression line (8.79) is the one that, among all the possible lines, achieves the best fit in terms of the square residuals (8.72).
As the first two moments of the distribution (8.80) change, the prediction (8.15) provides a linear approximation to a possibly nonlinear relationship between target and factor.
However, the prediction remains positively correlated with the target , and it remains uncorrelated with the residual , and the fit improves as the residual decreases.
The linear least squares regression works also with discrete variables. To illustrate, consider the mixture case (7c.413), where the target is binary: if the stock goes up and if it goes down (7c.275)

 X∼Bernoulli(p) (8.81)

and the factor is conditionally normal (7c.418)

 Z|x∼N(μx,σ2x). (8.82)

The regression line (8.79) allows us to predict the mean of the target variable, which is the probability of the stock going up. Again, the prediction (8.15) is positively correlated with the target , and uncorrelated with the residual .

Given the constraints (8.73), the key variables to solve in the regression LFM optimization (8.72) are the loadings, which can be computed analytically 56.2

 ⎛⎜ ⎜⎝β1,1⋯β1,¯k⋅⋅β¯n,1⋯β¯n,¯k⎞⎟ ⎟⎠=⎛⎜⎝Cv{X1,Z1}⋯Cv{X1,Z¯k}⋅⋅Cv{X¯n,Z1}⋯Cv{X¯n,Z¯k}⎞⎟⎠×⎛⎜⎝V{Z1}⋯Cv{Z1,Z¯k}⋅⋅Cv{Z¯k,Z1}⋯V{Z¯k}⎞⎟⎠−1 (8.83)

or in compact matrix notation

 β=Cv{X,Z}(Cv{Z})−1. (8.84)

Note that the optimal loadings do not depend on the scale matrix specifying the r-squared (8.28).

Given the optimal loadings (8.83), we obtain from the zero expectation constraint (8.73)

 ⎛⎜⎝α1⋅α¯n⎞⎟⎠=⎛⎜⎝E{X1}⋅E{X¯n}⎞⎟⎠−⎛⎜⎝Cv{X1,Z1}⋯Cv{X1,Z¯k}⋅⋅Cv{X¯n