# ARPM Coding Standards for R

**Table of contents**

Here we describe the R coding standards that the contributor is strictly committed to follow.

Coding styleAlways follow the Google’s R Style Guide except the naming rules. For the naming rules, follow the Google’s naming convention

In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:

- the time series \(\{x_{t}\}_{t=1}^{\bar{t}}\) in the ARPM Lab should be called
`x`

in the code, indexed by`t in range(t_)`

; - the routine \(\mathit{fit\_locdisp\_mlfp\_difflength}\) in the ARPM Lab should be called
`fit_locdisp_mlfp_difflength`

in the code.

The titles of the scripts are in the format `s_script_title`

. The `script_title`

field should be interpretable and intuitive (e.g. not too short).

The titles of the functions are in the format `function_title`

. The `function_title`

field should be interpretable and intuitive (e.g. not too short).

For inline comments, please see here.

For docstrings (comments on modules, functions and classes), please see section Docstrings.

Scripts should not run the other scripts, i.e. the command

```
source("../../../R/scripts/sources/s_script_title1.R")
```

is not allowed. Rather, a script `s_script_title2`

should import a database saved by `s_script_title1`

. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.

Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.

Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.

As an assign operator, `<-`

should be used instead of `=`

.

Do not use `attach()`

in order to make code more clear.

Plots should be done using packages from the basic R library.

Docstrings### Functions docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

- one link to the respective ARPM Lab Code Documentation;
- optional “See also” links;
- type and dimension of the input;
- type and dimension of the output.

```
# -*- coding: utf-8 -*-
single_output <- function(x # parameter1,
y # parameter2,
z=None # optional parameter1,
option1='a' # optional parameter2,
option2='c' # optional parameter3
){
# For details, see here.
# Parameters
# ----------
# x : scalar
# y : vector, dimensions (i_bar x 1)
# z : matrix, optional, dimensions (i_bar x j_bar)
# option1 : str, optional
# option2 : str, optional
# Returns
# ----------
# g : bool
# ## Step 1: Do this
w <- sin(x)
# ## Step 2: Do that
g <- w + 3
return(g)
```

### Scripts docstring

The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:

- one link to the respective ARPM Lab Code Documentation;
- optional “See also” links.

```
# ---
# jupyter:
# kernelspec:
# display_name: R
# language: R
# name: ir
# ---
# # s_script_name
# For details, see here.
# load function_name function
source("../../../R/functions/function_file/function_name.R")
# ## Step 1: Input parameters
# +
param1 <- 1
param2 <- 2
# -
# ## Step 2: Compute x
x <- param1 + param2
# ## Step 3: Compute y
y <- x-1
```

Variables dimensions
Basic data structures in R can be organized by their dimensionality and whether they aree homogeneous or heterogeneous. The standard categorization is given in the table below. R has no 0-dimensional, or scalar types. Individual numbers or strings which we consider as a scalar, are vectors of length one.

Homogeneous | Heterogeneous | |
---|---|---|

1d | Atomic vector | List |

2d | Matrix | Data frame |

nd | Array |

The standards for the R variables and CSV files are given in the table below.

Variable | Type | Lenght/Dimension | DB (CSV) |
---|---|---|---|

Univariate realized process | Time series - past \(\bar{t}\) steps | `t_` |
`t_ x 1` |

Univariate random variable | \(\bar{\jmath}\) MC scenarios | `j_` |
`j_ x 1` |

Univariate random process | \(\bar{\jmath}\) MC scenarios - u future steps | `j_ x u_` |
`j_ x u_` |

\(\bar{n}\)-variate realized process | Time series - past \(\bar{t}\) steps | `t_ x n_` |
`t_ x n_` |

\(\bar{n}\)-variate random variable | \(\bar{\jmath}\) MC scenarios | `(j_, n_)` |
`(j_, n_)` |

\(\bar{n}\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | `j_ x u_ x n_` |
`j_*u_ x n_` |

\((\bar{n}\times\bar{k})\)-variate realized process | Time series - past \(\bar{t}\) steps | `t_ x n_ x k_` |
`t_ x n_*k_` |

\((\bar{n}\times\bar{k})\)-variate random variable | \(\bar{\jmath}\) MC scenarios | `(j_, n_, k_)` |
`(j_, n_*k_)` |

\((\bar{n}\times\bar{k})\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | `j_ x u_ x n_ x k_` |
`j_*u_ x n_*k_` |

Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: `(ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_)`

.

### Static matrix algebra

A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as basic structure type in R, a vector of length `n_`

(see Variables dimension). For example \[
\boldsymbol{v}_{t_{\mathit{now}}}
\equiv
\begin{pmatrix}v_{1,t_{\mathit{now}}} \
v_{2,t_{\mathit{now}}}
\end{pmatrix}
=
\begin{pmatrix}$14.24 \
$48.61
\end{pmatrix}
\] should read in R

`v_tnow <- c(14.24, 48.61).`

The following commands in R

`v_tnow %*% matrix(1:4, nrow=2, byrow=TRUE)`

and

`matrix(1:4, nrow=2, byrow=TRUE) %*% v_tnow`

will not produce the result of same dimensions as in Python. Consequently, the contributor should take care of the order of variables in the code and in the ARPM Lab.

Depending on the situation, optimized techniques may be used such as

`solve(cv_z, z - e_z)`

to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.

### Dynamic matrix algebra

Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[
\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}
\] Then, according to the table in section Variables dimension, the variable `x`

would be a `j_ x n_`

matrix and the variable `b`

would be an `m_ x n_`

matrix. In such cases, the R code should read

`x_bar <- x %*% t(b).`

Function overloading
### Inputs

If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function

```
simulate_normal(0, 1, 100)
simulate_normal(c(0, 0), diag(c(1,1)), 100)
```

### Outputs

The contributor must *make sure* that the output is of the shape correct dimensions and type no matter what the shape of the input is.

Optimized techniques may be used in cases where there is a clear advantage in speed or accuracy.

Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code

`solve(sigma_sq, diag(5))`

brings little to none speed/accuracy gain, because `sigma2`

is a small matrix of a known size. In this case, the code must be

`solve(sigma_sq)`

On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique

`solve(sigma_sq, v)`

should be used.

If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.

Review our cookies policy for more information.