Code

The Code allows the user to absorb hands-on the contents of the ARPM Lab, understanding all the practical implications behind the Theory.
The Code is available in different languages, each embedded in its own coding environment: Python, MATLAB, R.
The coding environments can be accessed by clicking on the respective icons here below or on the general Code icon code-icon in any part of the ARPM Lab, such as in the Theory, or in the Documentation.


Python
85,847 lines of code
In collaboration with Mathworks

MATLAB
13,430 lines of code
In collaboration with R Foundation

R
2,443 lines of code
To learn how to code in Python, see the Refresher course

Table of contents

Here we describe the Python coding standards that the contributor is strictly committed to follow.

Coding style

Always follow the PEP8 style guide.

For naming rules, follow the Google’s naming convention

Type Pyblic Internal
Packages lower_with_under
Modules lower_with_under _lower_with_under()
Classes CapWords _CapWords
Exceptions CapWords
Functions lower_with_under() _lower_with_under()
Global/Class Constants CAPS_WITH_UNDER _CAPS_WITH_UNDER
Global/Class Variables lower_with_under _lower_with_under()
Instance Variables lower_with_under _lower_with_under (protected) or __lower_with_under (private)
Method Names lower_with_under() _lower_with_under() (protected) or __lower_with_under() (private)
Function/Method Parameters lower_with_under
Local Variables lower_with_under

In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:

  • the time series \(\{x_{t}\}_{t=1}^{\bar{t}}\) in the ARPM Lab should be called x in the code, indexed by t in range(t_);
  • the routine \(\mathit{fit\_locdisp\_mlfp\_difflength}\) in the ARPM Lab should be called fit_locdisp_mlfp_difflength in the code.

The titles of the scripts are in the format s_script_title. The script_title field should be interpretable and intuitive (e.g. not too short).

The titles of the functions are in the format function_title. The function_title field should be interpretable and intuitive (e.g. not too short).

For inline comments, please see here.

For docstrings (comments on modules, functions and classes), please see section Docstrings.

Scripts must not run other scripts, i.e. the command

from s_script_title1 import *

is not allowed. Rather, a script s_script_title2 should import a database saved by s_script_title1. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.

Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.

Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.

Docstrings

Functions docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

  • one link to the respective ARPM Lab Code Documentation;
  • optional “See also” links;
  • type and shape of the input;
  • type and shape of the output.
# -*- coding: utf-8 -*-

import numpy as np

def single_output(x, y, z=None, *, option1='a', option2='c'):
    """For details, see here.

    Parameters
    ----------
    x : float
    y : array, shape (i_bar, )
    z : array, optional, shape (i_bar, j_bar)
    option1 : str, optional
    option2 : str, optional

    Returns
    ----------
    g : bool
    """

    # Step 1: Do this
    w = np.sin(x)

    # Step 2: Do that
    g = w+3

    return g

Scripts docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ---
# jupyter:
#   jupytext:
#     text_representation:
#       extension: .py
#       format_name: light
#       format_version: '1.4'
#       jupytext_version: 1.1.5
#   kernelspec:
#     display_name: Python 3
#     language: python
#     name: python3
# ---

# # s_script_name
# For details, see here.

# +
import internal_packages

import external_packages
# -

# ## Input parameters

param1 = 1
param2 = 2

# ## Step 1: Compute x

x = param1 + param2

# ## Step 2: Compute y

y = x-1
Variables dimensions

The standards for the NumPy variables and CSV files are given in the table below

Variable Type NumPy DB (CSV)
Univariate realized process Time series - past \(\bar{t}\) steps (t_, ) (t_, 1)
Univariate random variable \(\bar{\jmath}\) MC scenarios (j_, ) (j_, 1)
Univariate random process \(\bar{\jmath}\) MC scenarios - u future steps (j_, u_) (j_*u_, 1)
\(\bar{n}\)-variate realized process Time series - past \(\bar{t}\) steps (t_, n_) (t_, n_)
\(\bar{n}\)-variate random variable \(\bar{\jmath}\) MC scenarios (j_, n_) (j_, n_)
\(\bar{n}\)-variate random process \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps (j_, u_, n_) (j_*u_, n_)
\((\bar{n}\times\bar{k})\)-variate realized process Time series - past \(\bar{t}\) steps (t_, n_, k_) (t_, n_*k_)
\((\bar{n}\times\bar{k})\)-variate random variable \(\bar{\jmath}\) MC scenarios (j_, n_, k_) (j_, n_*k_)
\((\bar{n}\times\bar{k})\)-variate random process \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps (j_, u_, n_, k_) (j_*u_, n_*k_)

To convert a NumPy array into a dataframe, the rule is:

Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_).

Matrix algebra

Static matrix algebra

A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as a 1D Numpy array of shape (n_, ) in Python (see Variables dimension). For example \[ \boldsymbol{v}_{t_{\mathit{now}}} \equiv \begin{pmatrix}v_{1,t_{\mathit{now}}} \ v_{2,t_{\mathit{now}}} \end{pmatrix} = \begin{pmatrix}$14.24 \ $48.61 \end{pmatrix} \] must read in Python

v_tnow = np.array([14.24, 48.61]).

NumPy handles 1D array with flexibility. Namely

v_tnow @ np.array([[1, 2], [3, 4]])

and

np.array([[1, 2], [3, 4]]) @ v_tnow

will produce the same result, because the 1D array is treated both as a row vector \(1\times 2\) and as a column vector \(2\times 1\).

This allows to use exactly the same order of variables in the code as it is used in the ARPM Lab. For example, if the formula in the ARPM Lab reads \[ \bar{\boldsymbol{x}}^{\mathit{Reg}} = \mathbb{E}\{\boldsymbol{X}\} + \mathbb{C}v\{\boldsymbol{X},\boldsymbol{Z}\}(\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\}) \] then in the code it must read

x_bar_reg = e_x + cv_x_z @ np.linalg.inv(cv_z) @ (z - e_z).

Hence, in the static case, the order of appearance of the variables in the code must exactly follow the order in the ARPM Lab.

However, depending on the situation, optimized techniques may be used such as

np.linalg.solve(cv_z, z - e_z)

to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.

Dynamic matrix algebra

Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[ \{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}} \] Then, according to the table in section Variables dimension, the variable x would be a (j_, n_) array and the variable b would be a (m_, n_) array. In such cases, the Python code should read

x_bar = x @ b.T.

A similar rationale applies for other variables dypes, such as time series or paths of multidimensional objects, see section Variables dimension.

Function overloading

Inputs

If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function

simulate_normal(0, 1, 100)
simulate_normal(np.array([0, 0]), np.array([[1, 0],[0, 1]]), 100)

Outputs

The output of such functions can be divided in 2 classes: scenarios (or time series) and parameters.

  • The scenarios (or time series) should be of the shape (j_, n_) when n_ > 1 and (j_,) when n_ == 1, as discussed in section Variables dimension. The contributor must make sure that the output is of the shape (j_,) no matter what the shape of the input is (the shape of the output often depends on the shape of the input). A special case should be considered when j_ = 1 and n_ = 1 where the output in such case should be just a scalar, not an array.
  • When n_== 1, the output parameters should be just scalars, not arrays that contain only one element. For instance, the output of the meancov_sp function
mu, sig2 = meancov_sp(x, p)

where x.shape = (j_, ) must be scalars, not NumPy arrays.

Code optimization

Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.

Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code

np.linalg.solve(sigma2, np.eye(5))

brings little to none speed/accuracy gain, because sigma2 is a small matrix of a known size. In this case, the code must be

np.linalg.inv(sigma2)

On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique

np.linalg.solve(sigma2, v)

should be used.

If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.

Here we describe the content of the ARPM coding environments across languages.

All the code environments contain scripts, functions and usage example scripts.

  • The scripts implement the Case studies and toy examples, following the Theory.
  • The functions, which are called by the scripts, gather the most frequently used sequences of instructions that perform specific tasks, implementing the algorithms described in the Theory. The ARPM functions are divided by topic, see the Documentation.
  • Each of the usage example scripts implement a simple use case of a given function: for a given function, they show how the function is called and how to assign what it returns.

In each coding environment users find two main directories: one for the code (scripts and functions) and one for the databases.

  • The code is in the directory named after the coding language, which has two sub-directories, one for the scripts, and one for functions:
    • the scripts are grouped in the scripts directory, which in turn has two sub-directories:
      • sources containing the actual scripts created from the Documentation;
      • notebooks containing the Jupyter Notebook (Live Script for MATLAB) implementation of the scripts in the sources directory;
    • the functions are grouped in the functions directory, which in turn has sub-directories for the various topics;
      • usage example scripts for functions are stored in the usage-examples sub-directory of the functions directory.
  • The databases are in the databases directory, which in turn has two sub-directories:
    • global-databases containing static data that is used as input of scripts, common to all implementations;
    • temporary-databases containing dynamic data that is the output of a script and the input of, at least, another script, specific to the implementation.

Table of contents

Here we describe the protocol by which a contributor creates and maintains the ARPM’s implementation of the ARPM Lab Code (scripts and functions) for the ARPM Lab, deployed by ARPM on its website.

The protocol applies across all the coding languages implemented on the ARPM Lab.

In the below, ARPM means the company and/or its employees based on the context.

Git environment

The code is hosted on GitLab in the private git repository /arpm-lab/arpm-python, that ARPM created, owns and regulates, following the Shared Repository Model.

In the /arpm-lab/arpm-python git repository files are organized according to the following directory tree:

Repositories:(*)

  • Python/
    • arpym/arpym/
      • estimation/
      • portfolio/
      • pricing/
      • statistics/
      • tools/
      • views/
      • usage-examples/
    • scripts/
      • sources/
  • databases
    • (global-databases/)
    • temporary-databases/

(*) The name and structure of the functions folder might slightly differ among code languages.

The global-databases directory is a git sub-module added from the /arpm-lab/arpm-global-databases repository. The content of the other directories is copied by ARPM into the Coding Environment available to the ARPM Lab users.

ARPM maintains and publishes a code-dashboard that tracks the status of the ARPM Lab Code Documentation. The list includes the scripts and functions which are documented in the ARPM Lab Code Documentation and implemented in Python.

Code creation

The contributor creates the Python implementation;

  • the implementation is based on the ARPM Lab Code Documentation and/or relevant parts of the ARPM Lab Theory, as referenced by the ARPM Lab Code Documentation;
  • the contributor is strictly committed to follow the coding standards described in the ARPM website here;
  • in most cases, the creation of a code comes alongside the creation of the respective ARPM Lab Code Documentation;
  • if the ARPM Lab Code Documentation is not clear nor complete enough to make the implementation possible, the contributor must escalate the issue to ARPM, whic fixes or clarifies the ARPM Lab Code Documentation;
  • a script is “ready” when all the functions called by it are ready too;
  • the implementation of a function includes the implementation of the corresponding usage example script.
Code submission

The contributor submits the Python implementation;

  • the contributor pushes the changes to the shared git repository in a personal branch and notifies ARPM;
  • ARPM signs off that the code in the repository is in a consistent state, in that:
    • notebooks, scripts, functions, usage example scripts and databases run jointly without errors;
    • notebooks, scripts, functions, usage example scripts fully mirror the ARPM Lab Code Documentation;
  • the ARPM Researcher merges the contributor’s personal branch with the develop branch;
  • ARPM deploys the code from the develop branch to the ARPM beta website;
  • the contributors checks that all the components of the code are correctly linked to and from the rest of the ARPM Lab;
  • the ARPM Researcher merges the develop branch into the master branch;
  • ARPM deploys the code from the master branch to the ARPM production website.
Code deployment

ARPM deploys the code to the ARPM Lab for all the users.

Code maintenance

For the ongoing maintenance of the ARPM code:

  • ARPM is committed to ensuring that the code is working with new versions of the Python runtime, by updating the code accordingly;
  • the contributor is committed to updating the code according to the updates made to the ARPM Lab Code Documentation by ARPM, using the same protocol used for the code creation, where the revised documentations are listed in the code-dashboard.
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.
Review our cookies policy for more information.