Table of contents
Here we describe the Python coding standards that the contributor is strictly committed to follow.
Coding styleAlways follow the PEP8 style guide.
For naming rules, follow the Google’s naming convention
Type | Public | Internal |
---|---|---|
Packages | lower_with_under |
|
Modules | lower_with_under |
_lower_with_under() |
Classes | CapWords |
_CapWords |
Exceptions | CapWords |
|
Functions | lower_with_under() |
_lower_with_under() |
Global/Class Constants | CAPS_WITH_UNDER |
_CAPS_WITH_UNDER |
Global/Class Variables | lower_with_under |
_lower_with_under() |
Instance Variables | lower_with_under |
_lower_with_under (protected) or __lower_with_under (private) |
Method Names | lower_with_under() |
_lower_with_under() (protected) or __lower_with_under() (private) |
Function/Method Parameters | lower_with_under |
|
Local Variables | lower_with_under |
In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:
x
in the code, indexed by t in range(t_)
;fit_locdisp_mlfp_difflength
in the code.The titles of the scripts are in the format s_script_title
. The script_title
field should be interpretable and intuitive (e.g. not too short).
The titles of the functions are in the format function_title
. The function_title
field should be interpretable and intuitive (e.g. not too short).
For inline comments, please see here.
For docstrings (comments on modules, functions and classes), please see section Docstrings.
Scripts must not run other scripts, i.e. the command
from s_script_title1 import *
is not allowed. Rather, a script s_script_title2
should import a database saved by s_script_title1
. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.
Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.
Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.
DocstringsThe docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
# -*- coding: utf-8 -*-
import numpy as np
def single_output(x, y, z=None, *, option1='a', option2='c'):
"""For details, see here.
Parameters
----------
x : float
y : array, shape (i_bar, )
z : array, optional, shape (i_bar, j_bar)
option1 : str, optional
option2 : str, optional
Returns
----------
g : bool
"""
# Step 1: Do this
w = np.sin(x)
# Step 2: Do that
g = w+3
return g
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.4'
# jupytext_version: 1.1.5
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---
# # s_script_name
# For details, see here.
# +
import internal_packages
import external_packages
# -
# ## Input parameters
param1 = 1
param2 = 2
# ## Step 1: Compute x
x = param1 + param2
# ## Step 2: Compute y
y = x-1
Variables dimensions
The standards for the NumPy variables and CSV files are given in the table below
Variable | Type | NumPy | DB (CSV) |
---|---|---|---|
Univariate realized process | Time series - past \(\bar{t}\) steps | (t_, ) |
(t_, 1) |
Univariate random variable | \(\bar{\jmath}\) MC scenarios | (j_, ) |
(j_, 1) |
Univariate random process | \(\bar{\jmath}\) MC scenarios - u future steps | (j_, u_) |
(j_*u_, 1) |
\(\bar{n}\)-variate realized process | Time series - past \(\bar{t}\) steps | (t_, n_) |
(t_, n_) |
\(\bar{n}\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_) |
(j_, n_) |
\(\bar{n}\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | (j_, u_, n_) |
(j_*u_, n_) |
\((\bar{n}\times\bar{k})\)-variate realized process | Time series - past \(\bar{t}\) steps | (t_, n_, k_) |
(t_, n_*k_) |
\((\bar{n}\times\bar{k})\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_, k_) |
(j_, n_*k_) |
\((\bar{n}\times\bar{k})\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | (j_, u_, n_, k_) |
(j_*u_, n_*k_) |
To convert a NumPy array into a dataframe, the rule is:
Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_)
.
A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as a 1D Numpy array of shape (n_, )
in Python (see Variables dimension). For example \[
\boldsymbol{v}_{t_{\mathit{now}}} \equiv ( v_{1,t_{\mathit{now}}}, v_{2,t_{\mathit{now}}} )' = ($14.24, $48.61)'
\] must read in Python
v_tnow = np.array([14.24, 48.61]).
NumPy handles 1D array with flexibility. Namely
v_tnow @ np.array([[1, 2], [3, 4]])
and
np.array([[1, 2], [3, 4]]) @ v_tnow
are both allowed, because the 1D array is treated both as a row vector \(1\times 2\) and as a column vector \(2\times 1\).
This allows to use exactly the same order of variables in the code as it is used in the ARPM Lab. For example, if the formula in the ARPM Lab reads \[ \bar{\boldsymbol{x}}^{\mathit{Reg}} = \mathbb{E}\{\boldsymbol{X}\} + \mathbb{C}v\{\boldsymbol{X},\boldsymbol{Z}\}(\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\}) \] then in the code it must read
x_bar_reg = e_x + cv_x_z @ np.linalg.inv(cv_z) @ (z - e_z).
Hence, in the static case, the order of appearance of the variables in the code must exactly follow the order in the ARPM Lab.
However, depending on the situation, optimized techniques may be used such as
np.linalg.solve(cv_z, z - e_z)
to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.
Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[
\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}
\] Then, according to the table in section Variables dimension, the variable x
would be a (j_, n_)
array and the variable b
would be a (m_, n_)
array. In such cases, the Python code should read
x_bar = x @ b.T.
A similar rationale applies for other variables dypes, such as time series or paths of multidimensional objects, see section Variables dimension.
Function overloadingIf a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function
simulate_normal(0, 1, 100)
simulate_normal(np.array([0, 0]), np.array([[1, 0],[0, 1]]), 100)
The output of such functions can be divided in 2 classes: scenarios (or time series) and parameters.
(j_, n_)
when n_ > 1
and (j_,)
when n_ == 1
, as discussed in section Variables dimension. The contributor must make sure that the output is of the shape (j_,)
no matter what the shape of the input is (the shape of the output often depends on the shape of the input). A special case should be considered when j_ = 1
and n_ = 1
where the output in such case should be just a scalar, not an array.n_== 1
, the output parameters should be just scalars, not arrays that contain only one element. For instance, the output of the meancov_sp functionmu, sig2 = meancov_sp(x, p)
where x.shape = (j_, )
must be scalars, not NumPy arrays.
Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.
Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code
np.linalg.solve(sigma2, np.eye(5))
brings little to none speed/accuracy gain, because sigma2
is a small matrix of a known size. In this case, the code must be
np.linalg.inv(sigma2)
On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique
np.linalg.solve(sigma2, v)
should be used.
If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.
Here we describe the content of the ARPM coding environments across languages.
All the code environments contain scripts, functions and usage example scripts.
In each coding environment users find two main directories: one for the code (scripts and functions) and one for the databases.
scripts
directory, which in turn has two
sub-directories:
sources
containing the actual scripts created from the
Documentation;
notebooks
containing the Jupyter Notebook (Live Script for MATLAB)
implementation of the scripts in the sources
directory;
functions
directory, which in turn has
sub-directories for the various topics;
usage-examples
sub-directory of the functions
directory.
databases
directory, which in turn has two
sub-directories:
global-databases
containing static data that is used as input of scripts,
common to all implementations;
temporary-databases
containing dynamic data that is the output of a script
and the input of, at least, another script, specific to the implementation.
Table of contents
Here we describe the protocol by which a contributor creates and maintains the ARPM’s implementation of the ARPM Lab Code (scripts and functions) for the ARPM Lab, deployed by ARPM on its website.
The protocol applies across all the coding languages implemented on the ARPM Lab.
In the below, ARPM means the company and/or its employees based on the context.
Git environmentThe code is hosted on GitLab in the private git repository /arpm-lab/arpm-python, that ARPM created, owns and regulates, following the Shared Repository Model.
In the /arpm-lab/arpm-python git repository files are organized according to the following directory tree:
Repositories:(*)
(*) The name and structure of the functions folder might slightly differ among code languages.
The global-databases
directory is a git sub-module added from the /arpm-lab/arpm-global-databases repository. The content of the other directories is copied by ARPM into the Coding Environment available to the ARPM Lab users.
ARPM maintains and publishes a code-dashboard that tracks the status of the ARPM Lab Code Documentation. The list includes the scripts and functions which are documented in the ARPM Lab Code Documentation and implemented in Python.
Code creationThe contributor creates the Python implementation;
The contributor submits the Python implementation;
develop
branch;develop
branch to the ARPM beta website;develop
branch into the master
branch;master
branch to the ARPM production website.ARPM deploys the code to the ARPM Lab for all the users.
Code maintenanceFor the ongoing maintenance of the ARPM code:
Here we describe the content of the ARPM coding environments across languages.
All the code environments contain scripts, functions and usage example scripts.
In each coding environment users find two main directories: one for the code (scripts and functions) and one for the databases.
scripts
directory, which in turn has two
sub-directories:
sources
containing the actual scripts created from the
Documentation;
notebooks
containing the Jupyter Notebook (Live Script for MATLAB)
implementation of the scripts in the sources
directory;
functions
directory, which in turn has
sub-directories for the various topics;
usage-examples
sub-directory of the functions
directory.
databases
directory, which in turn has two
sub-directories:
global-databases
containing static data that is used as input of scripts,
common to all implementations;
temporary-databases
containing dynamic data that is the output of a script
and the input of, at least, another script, specific to the implementation.
Table of contents
Here we describe the protocol by which a contributor creates and maintains the ARPM’s implementation of the ARPM Lab Code (scripts and functions) for the ARPM Lab, deployed by ARPM on its website.
The protocol applies across all the coding languages implemented on the ARPM Lab.
In the below, ARPM means the company and/or its employees based on the context.
Git environmentThe code is hosted on GitLab in the private git repository /arpm-lab/arpm-matlab, that ARPM created, owns and regulates, following the Shared Repository Model.
In the /arpm-lab/arpm-matlab git repository files are organized according to the following directory tree:
Repositories:(*)
(*) The name and structure of the functions folder might slightly differ among code languages.
The global-databases
directory is a git sub-module added from the /arpm-lab/arpm-global-databases repository. The content of the other directories is copied by ARPM into the Coding Environment available to the ARPM Lab users.
ARPM maintains and publishes a code-dashboard that tracks the status of the ARPM Lab Code Documentation. The list includes the scripts and functions which are documented in the ARPM Lab Code Documentation and implemented in MATLAB.
Code creationThe contributor creates the MATLAB implementation;
The contributor submits the MATLAB implementation;
develop
branch;develop
branch to the ARPM beta website;develop
branch into the master
branch;master
branch to the ARPM production website.ARPM deploys the code to the ARPM Lab for all the users.
Code maintenanceFor the ongoing maintenance of the ARPM code:
Table of contents
Here we describe the R coding standards that the contributor is strictly committed to follow.
Coding styleAlways follow the Google’s R Style Guide except the naming rules. For the naming rules, follow the Google’s naming convention
In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:
x
in the code, indexed by t in range(t_)
;fit_locdisp_mlfp_difflength
in the code.The titles of the scripts are in the format s_script_title
. The script_title
field should be interpretable and intuitive (e.g. not too short).
The titles of the functions are in the format function_title
. The function_title
field should be interpretable and intuitive (e.g. not too short).
For inline comments, please see here.
For docstrings (comments on modules, functions and classes), please see section Docstrings.
Scripts should not run the other scripts, i.e. the command
source("../../../R/scripts/sources/s_script_title1.R")
is not allowed. Rather, a script s_script_title2
should import a database saved by s_script_title1
. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.
Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.
Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.
As an assign operator, <-
should be used instead of =
.
Do not use attach()
in order to make code more clear.
Plots should be done using packages from the basic R library.
DocstringsThe docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
# -*- coding: utf-8 -*-
single_output <- function(x # parameter1,
y # parameter2,
z=None # optional parameter1,
option1='a' # optional parameter2,
option2='c' # optional parameter3
){
# For details, see here.
# Parameters
# ----------
# x : scalar
# y : vector, dimensions (i_bar x 1)
# z : matrix, optional, dimensions (i_bar x j_bar)
# option1 : str, optional
# option2 : str, optional
# Returns
# ----------
# g : bool
# ## Step 1: Do this
w <- sin(x)
# ## Step 2: Do that
g <- w + 3
return(g)
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
# ---
# jupyter:
# kernelspec:
# display_name: R
# language: R
# name: ir
# ---
# # s_script_name
# For details, see here.
# load function_name function
source("../../../R/functions/function_file/function_name.R")
# ## Step 1: Input parameters
# +
param1 <- 1
param2 <- 2
# -
# ## Step 2: Compute x
x <- param1 + param2
# ## Step 3: Compute y
y <- x-1
Variables dimensions
Basic data structures in R can be organized by their dimensionality and whether they aree homogeneous or heterogeneous. The standard categorization is given in the table below. R has no 0-dimensional, or scalar types. Individual numbers or strings which we consider as a scalar, are vectors of length one.
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
The standards for the R variables and CSV files are given in the table below.
Variable | Type | Lenght/Dimension | DB (CSV) |
---|---|---|---|
Univariate realized process | Time series - past \(\bar{t}\) steps | t_ |
t_ x 1 |
Univariate random variable | \(\bar{\jmath}\) MC scenarios | j_ |
j_ x 1 |
Univariate random process | \(\bar{\jmath}\) MC scenarios - u future steps | j_ x u_ |
j_ x u_ |
\(\bar{n}\)-variate realized process | Time series - past \(\bar{t}\) steps | t_ x n_ |
t_ x n_ |
\(\bar{n}\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_) |
(j_, n_) |
\(\bar{n}\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | j_ x u_ x n_ |
j_*u_ x n_ |
\((\bar{n}\times\bar{k})\)-variate realized process | Time series - past \(\bar{t}\) steps | t_ x n_ x k_ |
t_ x n_*k_ |
\((\bar{n}\times\bar{k})\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_, k_) |
(j_, n_*k_) |
\((\bar{n}\times\bar{k})\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | j_ x u_ x n_ x k_ |
j_*u_ x n_*k_ |
Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_)
.
A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as basic structure type in R, a vector of length n_
(see Variables dimension). For example \[
\boldsymbol{v}_{t_{\mathit{now}}}
\equiv
\begin{pmatrix}v_{1,t_{\mathit{now}}} \
v_{2,t_{\mathit{now}}}
\end{pmatrix}
=
\begin{pmatrix}$14.24 \
$48.61
\end{pmatrix}
\] should read in R
v_tnow <- c(14.24, 48.61).
The following commands in R
v_tnow %*% matrix(1:4, nrow=2, byrow=TRUE)
and
matrix(1:4, nrow=2, byrow=TRUE) %*% v_tnow
will not produce the result of same dimensions as in Python. Consequently, the contributor should take care of the order of variables in the code and in the ARPM Lab.
Depending on the situation, optimized techniques may be used such as
solve(cv_z, z - e_z)
to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.
Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[
\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}
\] Then, according to the table in section Variables dimension, the variable x
would be a j_ x n_
matrix and the variable b
would be an m_ x n_
matrix. In such cases, the R code should read
x_bar <- x %*% t(b).
Function overloading
If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function
simulate_normal(0, 1, 100)
simulate_normal(c(0, 0), diag(c(1,1)), 100)
The contributor must make sure that the output is of the shape correct dimensions and type no matter what the shape of the input is.
Code optimizationOptimized techniques may be used in cases where there is a clear advantage in speed or accuracy.
Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code
solve(sigma_sq, diag(5))
brings little to none speed/accuracy gain, because sigma2
is a small matrix of a known size. In this case, the code must be
solve(sigma_sq)
On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique
solve(sigma_sq, v)
should be used.
If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.
Here we describe the content of the ARPM coding environments across languages.
All the code environments contain scripts, functions and usage example scripts.
In each coding environment users find two main directories: one for the code (scripts and functions) and one for the databases.
scripts
directory, which in turn has two
sub-directories:
sources
containing the actual scripts created from the
Documentation;
notebooks
containing the Jupyter Notebook (Live Script for MATLAB)
implementation of the scripts in the sources
directory;
functions
directory, which in turn has
sub-directories for the various topics;
usage-examples
sub-directory of the functions
directory.
databases
directory, which in turn has two
sub-directories:
global-databases
containing static data that is used as input of scripts,
common to all implementations;
temporary-databases
containing dynamic data that is the output of a script
and the input of, at least, another script, specific to the implementation.
Table of contents
Here we describe the protocol by which a contributor creates and maintains the ARPM’s implementation of the ARPM Lab Code (scripts and functions) for the ARPM Lab, deployed by ARPM on its website.
The protocol applies across all the coding languages implemented on the ARPM Lab.
In the below, ARPM means the company and/or its employees based on the context.
Git environmentThe code is hosted on GitLab in the private git repository /arpm-lab/arpm-r, that ARPM created, owns and regulates, following the Shared Repository Model.
In the /arpm-lab/arpm-r git repository files are organized according to the following directory tree:
Repositories:(*)
(*) The name and structure of the functions folder might slightly differ among code languages.
The global-databases
directory is a git sub-module added from the /arpm-lab/arpm-global-databases repository. The content of the other directories is copied by ARPM into the Coding Environment available to the ARPM Lab users.
ARPM maintains and publishes a code-dashboard that tracks the status of the ARPM Lab Code Documentation. The list includes the scripts and functions which are documented in the ARPM Lab Code Documentation and implemented in R.
Code creationThe contributor creates the R implementation;
The contributor submits the R implementation;
develop
branch;develop
branch to the ARPM beta website;develop
branch into the master
branch;master
branch to the ARPM production website.ARPM deploys the code to the ARPM Lab for all the users.
Code maintenanceFor the ongoing maintenance of the ARPM code: