ARPM Coding Standards for R
Table of contents
Here we describe the R coding standards that the contributor is strictly committed to follow.
Coding styleAlways follow the Google’s R Style Guide except the naming rules. For the naming rules, follow the Google’s naming convention
In general, any variable, function or object names in the code must follow the name presented in the ARPM Lab. For example:
- the time series \(\{x_{t}\}_{t=1}^{\bar{t}}\) in the ARPM Lab should be called
x
in the code, indexed byt in range(t_)
; - the routine \(\mathit{fit\_locdisp\_mlfp\_difflength}\) in the ARPM Lab should be called
fit_locdisp_mlfp_difflength
in the code.
The titles of the scripts are in the format s_script_title
. The script_title
field should be interpretable and intuitive (e.g. not too short).
The titles of the functions are in the format function_title
. The function_title
field should be interpretable and intuitive (e.g. not too short).
For inline comments, please see here.
For docstrings (comments on modules, functions and classes), please see section Docstrings.
Scripts should not run the other scripts, i.e. the command
source("../../../R/scripts/sources/s_script_title1.R")
is not allowed. Rather, a script s_script_title2
should import a database saved by s_script_title1
. Databases must be as parsimonious and aggregated as possible, so that the same, few, clean .csv files can be called in all the case studies. See more in section Variables dimension.
Scripts must be as modular as possible: any time there is a copy&paste, the contributor must evaluate the option of creating a function for those operations.
Scripts must be as simple as possible: any time there is a need for advanced optimizations/computations, the contributor must evaluate the option of creating a functions for those operations. See more in section Code optimization.
As an assign operator, <-
should be used instead of =
.
Do not use attach()
in order to make code more clear.
Plots should be done using packages from the basic R library.
DocstringsFunctions docstring
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
- one link to the respective ARPM Lab Code Documentation;
- optional “See also” links;
- type and dimension of the input;
- type and dimension of the output.
# -*- coding: utf-8 -*-
single_output <- function(x # parameter1,
y # parameter2,
z=None # optional parameter1,
option1='a' # optional parameter2,
option2='c' # optional parameter3
){
# For details, see here.
# Parameters
# ----------
# x : scalar
# y : vector, dimensions (i_bar x 1)
# z : matrix, optional, dimensions (i_bar x j_bar)
# option1 : str, optional
# option2 : str, optional
# Returns
# ----------
# g : bool
# ## Step 1: Do this
w <- sin(x)
# ## Step 2: Do that
g <- w + 3
return(g)
Scripts docstring
The docstring and comments must strictly follow the template below. In particular, the docstring must only contain:
- one link to the respective ARPM Lab Code Documentation;
- optional “See also” links.
# ---
# jupyter:
# kernelspec:
# display_name: R
# language: R
# name: ir
# ---
# # s_script_name
# For details, see here.
# load function_name function
source("../../../R/functions/function_file/function_name.R")
# ## Step 1: Input parameters
# +
param1 <- 1
param2 <- 2
# -
# ## Step 2: Compute x
x <- param1 + param2
# ## Step 3: Compute y
y <- x-1
Variables dimensions
Basic data structures in R can be organized by their dimensionality and whether they aree homogeneous or heterogeneous. The standard categorization is given in the table below. R has no 0-dimensional, or scalar types. Individual numbers or strings which we consider as a scalar, are vectors of length one.
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
The standards for the R variables and CSV files are given in the table below.
Variable | Type | Lenght/Dimension | DB (CSV) |
---|---|---|---|
Univariate realized process | Time series - past \(\bar{t}\) steps | t_ |
t_ x 1 |
Univariate random variable | \(\bar{\jmath}\) MC scenarios | j_ |
j_ x 1 |
Univariate random process | \(\bar{\jmath}\) MC scenarios - u future steps | j_ x u_ |
j_ x u_ |
\(\bar{n}\)-variate realized process | Time series - past \(\bar{t}\) steps | t_ x n_ |
t_ x n_ |
\(\bar{n}\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_) |
(j_, n_) |
\(\bar{n}\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | j_ x u_ x n_ |
j_*u_ x n_ |
\((\bar{n}\times\bar{k})\)-variate realized process | Time series - past \(\bar{t}\) steps | t_ x n_ x k_ |
t_ x n_*k_ |
\((\bar{n}\times\bar{k})\)-variate random variable | \(\bar{\jmath}\) MC scenarios | (j_, n_, k_) |
(j_, n_*k_) |
\((\bar{n}\times\bar{k})\)-variate random process | \(\bar{\jmath}\) MC scenarios - \(\bar{u}\) future steps | j_ x u_ x n_ x k_ |
j_*u_ x n_*k_ |
Group all the dimensions in two buckets: first, those you want to be indices, then, those you want to be headers: (ind1*ind2*...*ind_i_, dim1*dim2*...*dim_n_)
.
Static matrix algebra
A vector (\(\bar{n}\times 1\)) in the ARPM Lab is represented as basic structure type in R, a vector of length n_
(see Variables dimension). For example \[
\boldsymbol{v}_{t_{\mathit{now}}}
\equiv
\begin{pmatrix}v_{1,t_{\mathit{now}}} \
v_{2,t_{\mathit{now}}}
\end{pmatrix}
=
\begin{pmatrix}$14.24 \
$48.61
\end{pmatrix}
\] should read in R
v_tnow <- c(14.24, 48.61).
The following commands in R
v_tnow %*% matrix(1:4, nrow=2, byrow=TRUE)
and
matrix(1:4, nrow=2, byrow=TRUE) %*% v_tnow
will not produce the result of same dimensions as in Python. Consequently, the contributor should take care of the order of variables in the code and in the ARPM Lab.
Depending on the situation, optimized techniques may be used such as
solve(cv_z, z - e_z)
to compute \((\mathbb{C}v\{\boldsymbol{Z}\})^{-1}(\boldsymbol{z}-\mathbb{E}\{\boldsymbol{Z}\})\) for a very large and ill-conditioned \(\mathbb{C}v\{\boldsymbol{Z}\}\). See Code optimization for details.
Dynamic matrix algebra
Consider the dynamic case where you need to multiply a \(\bar{m}\times\bar{n}\) matrix \(\boldsymbol{b}\) with \(\bar{\jmath}\) scenarios of a \(\bar{n}\)-dimensional variable \(\boldsymbol{x}^{(j)}\), i.e. you want to modify the scenarios as \[
\{\bar{\boldsymbol{x}}^{(j)}\}_{j=1}^{\bar{\jmath}} \leftarrow \{\boldsymbol{bx}^{(j)}\}_{j=1}^{\bar{\jmath}}
\] Then, according to the table in section Variables dimension, the variable x
would be a j_ x n_
matrix and the variable b
would be an m_ x n_
matrix. In such cases, the R code should read
x_bar <- x %*% t(b).
Function overloading
Inputs
If a function works with both multivariate and univariate variables, then in the univariate case it must be able to accept scalars as inputs. For example, all of the below should be valid queries for the simulate_normal function
simulate_normal(0, 1, 100)
simulate_normal(c(0, 0), diag(c(1,1)), 100)
Outputs
The contributor must make sure that the output is of the shape correct dimensions and type no matter what the shape of the input is.
Code optimizationOptimized techniques may be used in cases where there is a clear advantage in speed or accuracy.
Optimized techniques should not be used when the ratio between speed/accuracy gain and clarity is low. For instance, to compute the inverse of a well conditioned 5×5 matrix, the code
solve(sigma_sq, diag(5))
brings little to none speed/accuracy gain, because sigma2
is a small matrix of a known size. In this case, the code must be
solve(sigma_sq)
On the other hand, to invert a large ill-conditioned matrix \(\boldsymbol{\sigma}^2\) and multiply it with a matrix (vector) \(\boldsymbol{v}\), i.e. to compute \((\boldsymbol{\sigma}^2)^{-1}\boldsymbol{v}\), the optimized technique
solve(sigma_sq, v)
should be used.
If there is a need for “too much” optimization, then the contributor must evaluate if the optimization in the code is suitable to be discussed in detail in the ARPM Lab and escalate the issue to ARPM.

Review our cookies policy for more information.