ARPM Coding Standards for Python

Table of contents

Here we describe the Python coding standards that the contributor is strictly committed to follow.

Coding style

Always follow the PEP8 style guide.

For naming rules, follow the Google’s naming convention

Type Pyblic Internal
Packages lower_with_under
Modules lower_with_under _lower_with_under()
Classes CapWords _CapWords
Exceptions CapWords
Functions lower_with_under() _lower_with_under()
Global/Class Constants CAPS_WITH_UNDER _CAPS_WITH_UNDER
Global/Class Variables lower_with_under _lower_with_under()
Instance Variables lower_with_under _lower_with_under (protected) or __lower_with_under (private)
Method Names lower_with_under() _lower_with_under() (protected) or __lower_with_under() (private)
Function/Method Parameters lower_with_under
Local Variables lower_with_under

In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:

  • the time series \(\{x_{t}\}_{t=1}^{\bar{t}}\) in the ARPM Lab should be called x in the code, indexed by t in range(t_);
  • the routine \(\mathit{fit\_locdisp\_mlfp\_difflength}\) in the ARPM Lab should be called fit_locdisp_mlfp_difflength in the code.

The titles of the scripts are in the format s_script_title. The script_title field should be interpretable and intuitive (e.g. not too short).

The titles of the functions are in the format function_title. The function_title field should be interpretable and intuitive (e.g. not too short).

For inline comments, please see here.

For docstrings (comments on modules, functions and classes), please see section Docstrings.

Scripts must not run other scripts, i.e. the command

from s_script_title1 import *

is not allowed. Rather, a script s_script_title2 should import a database saved by s_script_title1. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.

Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.

Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.

Docstrings

Functions docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

  • one link to the respective ARPM Lab Code Documentation;
  • optional “See also” links;
  • type and shape of the input;
  • type and shape of the output.
# -*- coding: utf-8 -*-

import numpy as np

def single_output(x, y, z=None, *, option1='a', option2='c'):
    """For details, see here.

    Parameters
    ----------
    x : float
    y : array, shape (i_bar, )
    z : array, optional, shape (i_bar, j_bar)
    option1 : str, optional
    option2 : str, optional

    Returns
    ----------
    g : bool
    """

    # Step 1: Do this
    w = np.sin(x)

    # Step 2: Do that
    g = w+3

    return g

Scripts docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ---
# jupyter:
#   jupytext:
#     text_representation:
#       extension: .py
#       format_name: light
#       format_version: '1.4'
#       jupytext_version: 1.1.5
#   kernelspec:
#     display_name: Python 3
#     language: python
#     name: python3
# ---

# # s_script_name
# For details, see here.

# +
import internal_packages

import external_packages
# -

# ## Input parameters

param1 = 1
param2 = 2

# ## Step 1: Compute x

x = param1 + param2

# ## Step 2: Compute y

y = x-1
Variables dimensions

The standards for the NumPy variables and CSV files are given in the table below

Variable Type NumPy DB (CSV)
Univariate realized process Time series - past \(\bar{t}\) steps (t_, ) (t_, 1)
Univariate random variable \(\bar{\jmath}\) MC scenarios (j_, ) (j_, 1)
Univariate random process \(\bar{\jmath}\) MC scenarios - u future steps (j_, u_) (j_*u_, 1)
\(\bar{n}\)-variate realized process Time series - past \(\bar{t}\) steps (t_, n_) (t_, n_)
\(\bar{n}\)-variate random variable \(\bar{\jmath}\) MC scenarios (j_, n_) (j_, n_)
\(\bar{n}\)-variate random process \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps (j_, u_, n_) (j_*u_, n_)
\((\bar{n}\times\bar{k})\)-variate realized process Time series - past \(\bar{t}\) steps (t_, n_, k_) (t_, n_*k_)
\((\bar{n}\times\bar{k})\)-variate random variable \(\bar{\jmath}\) MC scenarios (j_, n_, k_) (j_, n_*k_)
\((\bar{n}\times\bar{k})\)-variate random process \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps (j_, u_, n_, k_) (j_*u_, n_*k_)

To convert a NumPy array into a dataframe, the rule is:

Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_).

Matrix algebra

Static matrix algebra

A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as a 1D Numpy array of shape (n_, ) in Python (see Variables dimension). For example \[ \boldsymbol{v}_{t_{\mathit{now}}} \equiv \begin{pmatrix}v_{1,t_{\mathit{now}}} \ v_{2,t_{\mathit{now}}} \end{pmatrix} = \begin{pmatrix}$14.24 \ $48.61 \end{pmatrix} \] must read in Python

v_tnow = np.array([14.24, 48.61]).

NumPy handles 1D array with flexibility. Namely

v_tnow @ np.array([[1, 2], [3, 4]])

and

np.array([[1, 2], [3, 4]]) @ v_tnow

will produce the same result, because the 1D array is treated both as a row vector \(1\times 2\) and as a column vector \(2\times 1\).

This allows to use exactly the same order of variables in the code as it is used in the ARPM Lab. For example, if the formula in the ARPM Lab reads \[ \bar{\boldsymbol{x}}^{\mathit{Reg}} = \mathbb{E}\{\boldsymbol{X}\} + \mathbb{C}v\{\boldsymbol{X},\boldsymbol{Z}\}(\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\}) \] then in the code it must read

x_bar_reg = e_x + cv_x_z @ np.linalg.inv(cv_z) @ (z - e_z).

Hence, in the static case, the order of appearance of the variables in the code must exactly follow the order in the ARPM Lab.

However, depending on the situation, optimized techniques may be used such as

np.linalg.solve(cv_z, z - e_z)

to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.

Dynamic matrix algebra

Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[ \{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}} \] Then, according to the table in section Variables dimension, the variable x would be a (j_, n_) array and the variable b would be a (m_, n_) array. In such cases, the Python code should read

x_bar = x @ b.T.

A similar rationale applies for other variables dypes, such as time series or paths of multidimensional objects, see section Variables dimension.

Function overloading

Inputs

If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function

simulate_normal(0, 1, 100)
simulate_normal(np.array([0, 0]), np.array([[1, 0],[0, 1]]), 100)

Outputs

The output of such functions can be divided in 2 classes: scenarios (or time series) and parameters.

  • The scenarios (or time series) should be of the shape (j_, n_) when n_ > 1 and (j_,) when n_ == 1, as discussed in section Variables dimension. The contributor must make sure that the output is of the shape (j_,) no matter what the shape of the input is (the shape of the output often depends on the shape of the input). A special case should be considered when j_ = 1 and n_ = 1 where the output in such case should be just a scalar, not an array.
  • When n_== 1, the output parameters should be just scalars, not arrays that contain only one element. For instance, the output of the meancov_sp function
mu, sig2 = meancov_sp(x, p)

where x.shape = (j_, ) must be scalars, not NumPy arrays.

Code optimization

Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.

Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code

np.linalg.solve(sigma2, np.eye(5))

brings little to none speed/accuracy gain, because sigma2 is a small matrix of a known size. In this case, the code must be

np.linalg.inv(sigma2)

On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique

np.linalg.solve(sigma2, v)

should be used.

If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.
Review our cookies policy for more information.