# ARPM Coding Standards for R

Here we describe the R coding standards that the contributor is strictly committed to follow.

Coding style

In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:

• the time series $$\{x_{t}\}_{t=1}^{\bar{t}}$$ in the ARPM Lab should be called x in the code, indexed by t in range(t_);
• the routine $$\mathit{fit\_locdisp\_mlfp\_difflength}$$ in the ARPM Lab should be called fit_locdisp_mlfp_difflength in the code.

The titles of the scripts are in the format s_script_title. The script_title field should be interpretable and intuitive (e.g. not too short).

The titles of the functions are in the format function_title. The function_title field should be interpretable and intuitive (e.g. not too short).

For docstrings (comments on modules, functions and classes), please see section Docstrings.

Scripts should not run the other scripts, i.e. the command

source("../../../R/scripts/sources/s_script_title1.R")


is not allowed. Rather, a script s_script_title2 should import a database saved by s_script_title1. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.

Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.

Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.

As an assign operator, <- should be used instead of =.

Do not use attach() in order to make code more clear.

Plots should be done using packages from the basic R library.

Docstrings

### Functions docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

• one link to the respective ARPM Lab Code Documentation;
• type and dimension of the input;
• type and dimension of the output.
# -*- coding: utf-8 -*-

single_output <- function(x            # parameter1,
y            # parameter2,
z=None       # optional parameter1,
option1='a'  # optional parameter2,
option2='c'  # optional parameter3
){
# For details, see here.

# Parameters
# ----------
# x : scalar
# y : vector, dimensions (i_bar x 1)
# z : matrix, optional, dimensions (i_bar x j_bar)
# option1 : str, optional
# option2 : str, optional

# Returns
# ----------
# g : bool

# ## Step 1: Do this

w <- sin(x)

# ## Step 2: Do that

g <- w + 3

return(g)

### Scripts docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

# ---
# jupyter:
#   kernelspec:
#     display_name: R
#     language: R
#     name: ir
# ---

# # s_script_name
# For details, see here.

source("../../../R/functions/function_file/function_name.R")

# ## Step 1: Input parameters

# +
param1 <- 1
param2 <- 2
# -

# ## Step 2: Compute x

x <- param1 + param2

# ## Step 3: Compute y
y <- x-1
Variables dimensions

Basic data structures in R can be organized by their dimensionality and whether they aree homogeneous or heterogeneous. The standard categorization is given in the table below. R has no 0-dimensional, or scalar types. Individual numbers or strings which we consider as a scalar, are vectors of length one.

Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Data frame
nd Array

The standards for the R variables and CSV files are given in the table below.

Variable Type Lenght/Dimension DB (CSV)
Univariate realized process Time series - past $$\bar{t}$$ steps t_ t_ x 1
Univariate random variable $$\bar{\jmath}$$ MC scenarios j_ j_ x 1
Univariate random process $$\bar{\jmath}$$ MC scenarios - u future steps j_ x u_ j_ x u_
$$\bar{n}$$-variate realized process Time series - past $$\bar{t}$$ steps t_ x n_ t_ x n_
$$\bar{n}$$-variate random variable $$\bar{\jmath}$$ MC scenarios (j_, n_) (j_, n_)
$$\bar{n}$$-variate random process $$\bar{\jmath}$$ MC scenarios - $$\bar{u}$$ future steps j_ x u_ x n_ j_*u_ x n_
$$(\bar{n}\times\bar{k})$$-variate realized process Time series - past $$\bar{t}$$ steps t_ x n_ x k_ t_ x n_*k_
$$(\bar{n}\times\bar{k})$$-variate random variable $$\bar{\jmath}$$ MC scenarios (j_, n_, k_) (j_, n_*k_)
$$(\bar{n}\times\bar{k})$$-variate random process $$\bar{\jmath}$$ MC scenarios - $$\bar{u}$$ future steps j_ x u_ x n_ x k_ j_*u_ x n_*k_

Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_).

Matrix algebra

### Static matrix algebra

A vector ($$\bar{n}\times 1$$) in the ARPM Lab is represented as basic structure type in R, a vector of length n_ (see Variables dimension). For example $\boldsymbol{v}_{t_{\mathit{now}}} \equiv \begin{pmatrix}v_{1,t_{\mathit{now}}} \ v_{2,t_{\mathit{now}}} \end{pmatrix} = \begin{pmatrix}14.24 \ 48.61 \end{pmatrix}$ should read in R

v_tnow <- c(14.24, 48.61).

The following commands in R

v_tnow %*% matrix(1:4, nrow=2, byrow=TRUE)

and

matrix(1:4, nrow=2, byrow=TRUE) %*% v_tnow

will not produce the result of same dimensions as in Python. Consequently, the contributor should take care of the order of variables in the code and in the ARPM Lab.

Depending on the situation, optimized techniques may be used such as

solve(cv_z, z - e_z)

to compute $$(\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})$$ for a very large and ill-conditioned $$\mathbb{C}v\{\boldsymbol{Z}\}$$. See Code optimization for details.

### Dynamic matrix algebra

Consider the dynamic case where you need to multiply a $$\bar{m}\times\bar{n}$$ matrix $$\boldsymbol{b}$$ with $$\bar{\jmath}$$ scenarios of a $$\bar{n}$$-dimensional variable $$\boldsymbol{x}^{(j)}$$, i.e. you want to modify the scenarios as $\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}$ Then, according to the table in section Variables dimension, the variable x would be a j_ x n_ matrix and the variable b would be an m_ x n_ matrix. In such cases, the R code should read

x_bar <- x %*% t(b).

### Inputs

If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function

simulate_normal(0, 1, 100)
simulate_normal(c(0, 0), diag(c(1,1)), 100)


### Outputs

The contributor must make sure that the output is of the shape correct dimensions and type no matter what the shape of the input is.

Code optimization

Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.

Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code

solve(sigma_sq, diag(5))

brings little to none speed/accuracy gain, because sigma2 is a small matrix of a known size. In this case, the code must be

solve(sigma_sq)

On the other hand, to invert a large ill-conditioned matrix $$\boldsymbol{\sigma}^2$$ and multiply it with a matrix (vector) $$\boldsymbol{v}$$, i.e. to compute $$(\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}$$, the optimized technique

solve(sigma_sq, v)

should be used.

If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.