ARPM Coding Standards for Python
Table of contents
Here we describe the Python coding standards that the contributor is strictly committed to follow.
Coding styleAlways follow the PEP8 style guide.
For naming rules, follow the Google’s naming convention
Type  Public  Internal 

Packages  lower_with_under 

Modules  lower_with_under 
_lower_with_under() 
Classes  CapWords 
_CapWords 
Exceptions  CapWords 

Functions  lower_with_under() 
_lower_with_under() 
Global/Class Constants  CAPS_WITH_UNDER 
_CAPS_WITH_UNDER 
Global/Class Variables  lower_with_under 
_lower_with_under() 
Instance Variables  lower_with_under 
_lower_with_under (protected) or __lower_with_under (private) 
Method Names  lower_with_under() 
_lower_with_under() (protected) or __lower_with_under() (private) 
Function/Method Parameters  lower_with_under 

Local Variables  lower_with_under 
In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:
 the time series \(\{x_{t}\}_{t=1}^{\bar{t}}\) in the ARPM Lab should be called
x
in the code, indexed byt in range(t_)
;  the routine \(\mathit{fit\_locdisp\_mlfp\_difflength}\) in the ARPM Lab should be called
fit_locdisp_mlfp_difflength
in the code.
The titles of the scripts are in the format s_script_title
. The script_title
field should be interpretable and intuitive (e.g. not too short).
The titles of the functions are in the format function_title
. The function_title
field should be interpretable and intuitive (e.g. not too short).
For inline comments, please see here.
For docstrings (comments on modules, functions and classes), please see section Docstrings.
Scripts must not run other scripts, i.e. the command
from s_script_title1 import *
is not allowed. Rather, a script s_script_title2
should import a database saved by s_script_title1
. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.
Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.
Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.
DocstringsFunctions docstring
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
 one link to the respective ARPM Lab Code Documentation;
 optional “See also” links;
 type and shape of the input;
 type and shape of the output.
# * coding: utf8 *
import numpy as np
def single_output(x, y, z=None, *, option1='a', option2='c'):
"""For details, see here.
Parameters

x : float
y : array, shape (i_bar, )
z : array, optional, shape (i_bar, j_bar)
option1 : str, optional
option2 : str, optional
Returns

g : bool
"""
# Step 1: Do this
w = np.sin(x)
# Step 2: Do that
g = w+3
return g
Scripts docstring
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
 one link to the respective ARPM Lab Code Documentation;
 optional “See also” links.
#!/usr/bin/env python3
# * coding: utf8 *
# 
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.4'
# jupytext_version: 1.1.5
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# 
# # s_script_name
# For details, see here.
# +
import internal_packages
import external_packages
# 
# ## Input parameters
param1 = 1
param2 = 2
# ## Step 1: Compute x
x = param1 + param2
# ## Step 2: Compute y
y = x1
Variables dimensions
The standards for the NumPy variables and CSV files are given in the table below
Variable  Type  NumPy  DB (CSV) 

Univariate realized process  Time series  past \(\bar{t}\) steps  (t_, ) 
(t_, 1) 
Univariate random variable  \(\bar{\jmath}\) MC scenarios  (j_, ) 
(j_, 1) 
Univariate random process  \(\bar{\jmath}\) MC scenarios  u future steps  (j_, u_) 
(j_*u_, 1) 
\(\bar{n}\)variate realized process  Time series  past \(\bar{t}\) steps  (t_, n_) 
(t_, n_) 
\(\bar{n}\)variate random variable  \(\bar{\jmath}\) MC scenarios  (j_, n_) 
(j_, n_) 
\(\bar{n}\)variate random process  \(\bar{\jmath}\) MC scenarios  \(\bar{u}\) future steps  (j_, u_, n_) 
(j_*u_, n_) 
\((\bar{n}\times\bar{k})\)variate realized process  Time series  past \(\bar{t}\) steps  (t_, n_, k_) 
(t_, n_*k_) 
\((\bar{n}\times\bar{k})\)variate random variable  \(\bar{\jmath}\) MC scenarios  (j_, n_, k_) 
(j_, n_*k_) 
\((\bar{n}\times\bar{k})\)variate random process  \(\bar{\jmath}\) MC scenarios  \(\bar{u}\) future steps  (j_, u_, n_, k_) 
(j_*u_, n_*k_) 
To convert a NumPy array into a dataframe, the rule is:
Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_)
.
Static matrix algebra
A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as a 1D Numpy array of shape (n_, )
in Python (see Variables dimension). For example \[
\boldsymbol{v}_{t_{\mathit{now}}} \equiv ( v_{1,t_{\mathit{now}}}, v_{2,t_{\mathit{now}}} )' = ($14.24, $48.61)'
\] must read in Python
v_tnow = np.array([14.24, 48.61]).
NumPy handles 1D array with flexibility. Namely
v_tnow @ np.array([[1, 2], [3, 4]])
and
np.array([[1, 2], [3, 4]]) @ v_tnow
are both allowed, because the 1D array is treated both as a row vector \(1\times 2\) and as a column vector \(2\times 1\).
This allows to use exactly the same order of variables in the code as it is used in the ARPM Lab. For example, if the formula in the ARPM Lab reads \[ \bar{\boldsymbol{x}}^{\mathit{Reg}} = \mathbb{E}\{\boldsymbol{X}\} + \mathbb{C}v\{\boldsymbol{X},\boldsymbol{Z}\}(\mathbb{C}v\{\boldsymbol{Z}\})^{1}(\boldsymbol{z}\mathbb{E}\{\boldsymbol{Z}\}) \] then in the code it must read
x_bar_reg = e_x + cv_x_z @ np.linalg.inv(cv_z) @ (z  e_z).
Hence, in the static case, the order of appearance of the variables in the code must exactly follow the order in the ARPM Lab.
However, depending on the situation, optimized techniques may be used such as
np.linalg.solve(cv_z, z  e_z)
to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{1}(\boldsymbol{z}\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and illconditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.
Dynamic matrix algebra
Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[
\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}
\] Then, according to the table in section Variables dimension, the variable x
would be a (j_, n_)
array and the variable b
would be a (m_, n_)
array. In such cases, the Python code should read
x_bar = x @ b.T.
A similar rationale applies for other variables dypes, such as time series or paths of multidimensional objects, see section Variables dimension.
Function overloadingInputs
If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function
simulate_normal(0, 1, 100)
simulate_normal(np.array([0, 0]), np.array([[1, 0],[0, 1]]), 100)
Outputs
The output of such functions can be divided in 2 classes: scenarios (or time series) and parameters.
 The scenarios (or time series) should be of the shape
(j_, n_)
whenn_ > 1
and(j_,)
whenn_ == 1
, as discussed in section Variables dimension. The contributor must make sure that the output is of the shape(j_,)
no matter what the shape of the input is (the shape of the output often depends on the shape of the input). A special case should be considered whenj_ = 1
andn_ = 1
where the output in such case should be just a scalar, not an array.  When
n_== 1
, the output parameters should be just scalars, not arrays that contain only one element. For instance, the output of the meancov_sp function
mu, sig2 = meancov_sp(x, p)
where x.shape = (j_, )
must be scalars, not NumPy arrays.
Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.
Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code
np.linalg.solve(sigma2, np.eye(5))
brings little to none speed/accuracy gain, because sigma2
is a small matrix of a known size. In this case, the code must be
np.linalg.inv(sigma2)
On the other hand, to invert a large illconditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{1}\boldsymbol{v}\), the optimized technique
np.linalg.solve(sigma2, v)
should be used.
If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.
Review our cookies policy for more information.