# Body of Knowledge

### Data Science for Finance

The module "Data Science for Finance" is the largest among the four modules of the ARPM Body of Knowledge.

This module covers the statistical tools needed to model and estimate the joint dynamics of the markets. Unlike related approaches in computer science or engineering, we root our coverage of data science into the pillars of quantitative statistics for finance (the "P" in ARPM). In particular:

- we introduce all machine learning/artificial intelligence models as generalizations of linear factor models, omnipresent (and mis-used) across finance

- we connect the estimation/calibration of all machine learning/artificial intelligence models with classical and Bayesian econometrics

- we address backtesting and model/estimation risk in the context of decision theory

- we translate machine learning/artificial intelligence inference into market view processing: distributional stress-testing for risk management and portfolio/business construction for portfolio management.

This module covers the statistical tools needed to model and estimate the joint dynamics of the markets. Unlike related approaches in computer science or engineering, we root our coverage of data science into the pillars of quantitative statistics for finance (the "P" in ARPM). In particular:

- we introduce all machine learning/artificial intelligence models as generalizations of linear factor models, omnipresent (and mis-used) across finance

- we connect the estimation/calibration of all machine learning/artificial intelligence models with classical and Bayesian econometrics

- we address backtesting and model/estimation risk in the context of decision theory

- we translate machine learning/artificial intelligence inference into market view processing: distributional stress-testing for risk management and portfolio/business construction for portfolio management.

Multivariate distributions

Location and dispersion

Copulas

Exponential decay and time conditioning

Kernels and state conditioning

Joint state and time conditioning/minimum entropy

Statistical power of flexible probabilities: effective number of scenarios

Kernels and state conditioning

Joint state and time conditioning/minimum entropy

Statistical power of flexible probabilities: effective number of scenarios

Law of large numbers and historical distribution with flexible probabilities

Location-dispersion: HFP mean/covariance and best-fitting ellipsoid

Location-dispersion: HFP mean/covariance and best-fitting ellipsoid

General principle

Maximum Likelihood with flexible probabilities and entropy formulation

Location-dispersion: MLFP ellipsoid under Student t distribution

Maximum Likelihood with flexible probabilities and entropy formulation

Location-dispersion: MLFP ellipsoid under Student t distribution

Local robustness: influence function and Huber’s “M” estimators

Global robustness: breakdown point, minimum volume ellipsoid

Outlier detection

Global robustness: breakdown point, minimum volume ellipsoid

Outlier detection

Terminology

Key ideas from non-parametric linear factor models

Supervised point predictors

Supervised probabilistic predictors

Unsupervised autoencoders

Latent variable models

Probabilistic graphical models

Key ideas from non-parametric linear factor models

Supervised point predictors

Supervised probabilistic predictors

Unsupervised autoencoders

Latent variable models

Probabilistic graphical models

Background

Fit and assessment

Logistic regression

Interactions

Encoding

Regularization

Trees

Gradient boosting

Cross-validation

Fit and assessment

Logistic regression

Interactions

Encoding

Regularization

Trees

Gradient boosting

Cross-validation

Background

Standard estimators

Cross-validation

Point prediction assessment

Probabilistic prediction assessment

Standard estimators

Cross-validation

Point prediction assessment

Probabilistic prediction assessment