Ngme2 - A new Flexible R Package for Latent non-Gaussian Models • ngme2

Introduction

In this vignette we provide a brief introduction to the ngme2 package.

Ngme2 (https://github.com/davidbolin/ngme2) is the updated version of Ngme, a package for estimating latent non-Gaussian models for repeated measurement data. Ngme2 follows a hierachical structure, differnet components (latent processes, different types of noises) are flexible to change and combine.

1 Features

Support temporal models like AR(1), Ornstein−Uhlenbeck and random walk processes, and spatial models like Matern fields.
Support latent processes constructed by non-Gaussian noises (normal inverse Gaussian(NIG), generalized asymmetric Laplace (GAL)).
Support non-Gaussian, and correlated measurement noises.
Support doing prediction at unknown locations.
Support latent processes and random-effects model for longitudinal data.
Support the bivariate type-G model, which can model 2 non-Gaussian fields jointly (Bolin 2020).
Support the separable space-time model.

2 Model Framework

The package Ngme2 provides methods for mixed effect models in the following form:

$$ {\bf Y}_{ij} = {\bf X}^T_{ij} {\bf \beta} + {\bf D}^T_{ij} {\bf U}_i + W_i(t_{ij}) + \epsilon_{ij}, \qquad j=1 \ldots n_i, i=1,\ldots,m $$

$m$ is the number of subjects, $n_i$ is the number of observations for each subject,
$Y$ is the response variable,
${\bf X}$ is the matrix of fixed effects explanatory variables,
${\bf \beta}$ is the fixed effects,
${\bf D}$ is the matrix of random effects explanatory variables,
$\bf U$ is the random effects,
$W_i(t_{ij})$ is a stochastic process driven by Gaussian or non-Gaussian noise,
$\epsilon$ is measurement error.

Here is a simple template for using the core function ngme to model the single response:

ngme(
  formula=Y ~ x1 + x2 + f(index, model="ar", noise="nig"),
  data=data.frame(Y=Y, x1=x1, x2=x2, index=index),
  noise = noise_normal()
)

Here, function f is for modeling the stochastic process W with Gaussian or non-Gaussian noise, we will discuss this later. noise stands for the measurement noise distribution. In this case, the model will have a Gaussian likelihood.

3 Non-Gaussian Model

Here we assume the non-Gaussian process is a type-G Lévy process, whose increments can be represented as location-scale mixtures: $\gamma + \mu V + \sigma \sqrt{V}Z,$ where $\gamma, \mu, \sigma$ are parameters, $Z\sim N(0,1)$ is independent of $V$ , and $V$ is a positive infinitely divisible random variable. This results in the following form, where $K$ is the operator part:

$KW|V \sim N(\gamma + \mu V, \sigma^2 \, \text{diag}(V)),$ where $\mu$ and $\sigma$ can be non-stationary.

One example in ngme2 is the normal inverse Gaussian (NIG) noise, where $V$ follows an Inverse Gaussian distribution with parameter $\nu$ (IG( $\nu$ , $\nu$ )).

A random variable $V$ follows an inverse Gaussian distribution with parameters $\eta_1$ and $\eta_2$ , denoted by $V\sim \text{IG}(\eta_1,\eta_2)$ , if it has probability density function (pdf) given by $\pi(v) = \frac{\sqrt{\eta_2}}{\sqrt{2\pi v^3}} \exp\left\{-\frac{\eta_1}{2}v - \frac{\eta_2}{2v} + \sqrt{\eta_1\eta_2}\right\},\quad \eta_1,\eta_2>0.$ We can generate samples from an inverse Gaussian distribution with parameters $\eta_1$ and $\eta_2$ by generating samples from the generalized inverse Gaussian distribution with parameters $p=-1/2$ , $a=\eta_1$ and $b=\eta_2$ . The rGIG function can be used to generate samples from the generalized inverse Gaussian distribution.

If $V\sim \text{IG}(\eta_1,\eta_2)$ , and $X = \gamma +\mu V + \sigma \sqrt{V}Z$ , with $Z\sim N(0,1)$ independent of $V$ , then $X$ follows a normal inverse Gaussian (NIG) distribution with pdf $\pi(x) = \frac{e^{\sqrt{\eta_1\eta_2}+\mu(x-\gamma)/\sigma^2}\sqrt{\eta_2\mu^2/\sigma^2+\eta_1\eta_2}}{\pi\sqrt{\eta_2\sigma^2+(x-\gamma)^2}} K_1\left(\sqrt{(\eta_2\sigma^2+(x-\gamma)^2)(\mu^2/\sigma^4+\eta_1/\sigma^2)}\right),$ where $K_1$ is a modified Bessel function of the third kind. In this form, the NIG density is overparameterized, so we set $\eta_1=\eta_2=\eta$ , which results in $E(V)=1$ . Thus, we have the parameters $\mu$ , $\gamma$ , and $\eta$ .

The NIG model assumes that the stochastic variance $V_i$ follows an inverse Gaussian distribution with parameters $\eta$ and $\eta h_i^2$ , where $h_i = \int_{\mathcal{D}} \varphi_i(\mathbf{s}) d\mathbf{s}.$

library(fmesher)
library(ngme2)
#> This is ngme2 of version 0.7.0
#> - See our homepage: https://davidbolin.github.io/ngme2 for more details.
#> 
#> Attaching package: 'ngme2'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(ggplot2)
library(plyr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:plyr':
#> 
#>     arrange, count, desc, failwith, id, mutate, rename, summarise,
#>     summarize
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(viridis)
#> Loading required package: viridisLite

4 Parameter Estimation

Ngme2 does maximum likelihood estimation through preconditioned stochastic gradient descent.
Multiple chains are run in parallel for better convergence checks.

See Model estimation and prediction for more details.

Ngme Model Structure

Specify the driven noise

There are 2 types of common noise involved in the model, one is the innovation noise of a stochastic process, one is the measurement noise of the observations. They can be both specified by noise_<type> function.

For now we support normal, NIG, and GAL noises.

The R class ngme_noise has the following interface:

library(fmesher)
library(splancs)
library(lattice)
library(ggplot2)
library(grid)
library(gridExtra)
library(viridis)
library(ngme2)

noise_normal(sigma = 1)              # normal noise
#> Noise type: NORMAL
#> Noise parameters: 
#>     sigma = 1
noise_nig(mu = 1, sigma = 2, nu = 1) # nig noise
#> Noise type: NIG
#> Noise parameters: 
#>     mu = 1
#>     sigma = 2
#>     nu = 1
noise_nig(            # non-stationary nig noise
  B_mu=matrix(c(1:10), ncol=2),
  theta_mu = c(1, 2),
  B_sigma=matrix(c(1:10), ncol=2),
  theta_sigma = c(1,2),
  nu = 1)
#> Noise type: NIG
#> Noise parameters: 
#>     theta_mu = 1, 2
#>     theta_sigma = 1, 2
#>     nu = 1

Additionally, ngme2 provides a special combined noise model that merges both Gaussian and NIG noise components:

noise_normal_nig(
  mu = 2,             # NIG parameter mu
  sigma_nig = 3,      # NIG noise scale
  nu = 1,             # NIG shape parameter
  sigma_normal = 0.8  # Normal noise standard deviation
)
#> Noise type: NORMAL_NIG
#> Noise parameters: 
#>     mu = 2
#>     sigma_nig = 3
#>     nu = 1
#>     sigma_normal = 0.8

This combined noise model is particularly useful for complex processes where both normal variations and heavy-tailed events occur. For more details, see the dedicated vignette: vignette("normal-nig-noise", package = "ngme2").

The 3rd example is the non-stationary NIG noise, where $\mu = \bf B_{\mu} \bf \theta_{\mu}$, and $\sigma = \exp(\bf B_{\sigma} \bf \theta_{\sigma})$.

ngme_noise(
  type,           # the type of noise
  theta_mu,       # mu parameter
  theta_sigma,    # sigma parameter
  nu,        # nu parameter
  B_mu,           # basis matrix for non-stationary mu
  B_sigma         # basis matrix for non-stationary sigma
)

It will construct the following noise structure:

$- \mathbf{\mu} + \mathbf{\mu} V + \mathbf{\sigma} \sqrt{V} Z$

where $\mu = \bf B_{\mu} \bf \theta_{\mu}$, and $\sigma = \exp(\bf B_{\sigma} \bf \theta_{\sigma})$. In this case, we can recover gaussian noise by setting type=“normal and ignoring theta_mu and nu. Or we can simply use helper function noise_normal(sd=1).

Specify stochastic process with `f` function

The middle layer is the stochastic process, in R interface, it is represented as a f function. The process can be specified by different noise structure. See ?ngme_model_types() for more details.

Some examples of using f function to specify ngme_model:

ngme2::f(1:10, model = "ar1", noise = noise_nig())
#> Model type: AR(1)
#>     rho = 0
#> Noise type: NIG
#> Noise parameters: 
#>     mu = 0
#>     sigma = 1
#>     nu = 1

One useful model would be the SPDE model with Gaussian or non-Gaussian noise, see the vignette for details.

Specifying latent models with formula in `ngme`

The latent model can be specified additively as a formula argument in ngme function together with fixed effects.

We use R formula to specify the latent model. We can specify the model using f within the formula.

For example, the following formula

formula <- Y ~ x1 + f(
    x2,
    model = "ar1",
    noise = noise_nig(),
    theta_K = 0.5
  ) + f(1:5,
    model = "rw1",
    circular = T,
    noise = noise_normal()
  )

corresponds to the model

$Y = \beta_0 + \beta_1 x_1 + W_1(x_2) + W_2(x_3) + \epsilon,$ where $W_1$ is an AR(1) process, $W_2$ is a random walk 1 process. $x_2$ is random effects.. . By default, we have intercept. The distribution of the measurement error $\epsilon$ is given in the ngme function.

The entire model can be fitted, along with the specification of the distribution of the measurement error through the ngme function:

ngme(
  formula = formula,
  family = noise_normal(sigma = 0.5),
  data = data.frame(Y = 1:5, x1 = 2:6, x2 = 3:7),
  control_opt = control_opt(
    estimation = FALSE
  )
)
#> *** Ngme object ***
#> 
#> Fixed effects: 
#> (Intercept)          x1 
#>       7.365      -0.869 
#> 
#> Models: 
#> $field1
#>   Model type: AR(1)
#>       rho = 0
#>   Noise type: NIG
#>   Noise parameters: 
#>       mu = 0
#>       sigma = 1
#>       nu = 1
#> 
#> $field2
#>   Model type: Random walk (order 1)
#>       No parameter.
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 1
#> 
#> Measurement noise: 
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 0.5

It gives the ngme object, which has three parts:

Fixed effects (intercept and x1)
Measurement noise (normal noise)
Latent models (contains 2 models, ar1 and rw1)

We can turn the estimation = TRUE to start estimating the model.

A simple example - AR1 process with nig noise

Now let’s see an example of an AR1 process with nig noise. The process is defined as

$W_i = \rho W_{i-1} + \epsilon_i,$ Here, $\epsilon_1, ..,\epsilon_n$ is the iid NIG noise. And, it is easy to verify that $$ K{\bf W} = \boldsymbol\epsilon,$$ where $K = \begin{bmatrix} \sqrt{1-\rho^2} \\ -\rho & 1 \\ & \ddots & \ddots \\ & & -\rho & 1 \end{bmatrix}$

n_obs <- 500
sigma_eps <- 0.5
alpha <- 0.5
mu = 2; delta = -mu
sigma <- 3
nu <- 1

# First we generate V. V_i follows inverse Gaussian distribution
trueV <- ngme2::rig(n_obs, nu, nu, seed = 10)

# Then generate the nig noise
mynoise <- delta + mu*trueV + sigma * sqrt(trueV) * rnorm(n_obs)
trueW <- Reduce(function(x,y){y + alpha*x}, mynoise, accumulate = T)
Y = trueW + rnorm(n_obs, mean=0, sd=sigma_eps)

# Add some fixed effects
x1 = runif(n_obs)
x2 = rexp(n_obs)
beta <- c(-3, -1, 2)
X <- (model.matrix(Y ~ x1 + x2))  # design matrix
Y = as.numeric(Y + X %*% beta)

Now let’s fit the model using ngme. Here we can use control_opt to modify the control variables for the ngme. See ?control_opt for more optioins.

# # Fit the model with the AR1 model
ngme_out <- ngme(
  Y ~ x1 + x2 + f(
    1:n_obs,
    name = "my_ar",
    model = "ar1",
    noise = noise_nig()
  ),
  data=data.frame(x1=x1, x2=x2, Y=Y),
  control_opt = control_opt(
    burnin = 100,
    iterations = 1000,
    std_lim = 0.4,
    n_parallel_chain = 4,
    stop_points = 10,
    print_check_info = FALSE,
    seed = 3,
    sampling_strategy = "ws"
    # verbose = T
  )
)
#> Starting estimation... 
#> 
#> Starting posterior sampling... 
#> Posterior sampling done! 
#> Note:
#>       1. Use ngme_post_samples(..) to access the posterior samples.
#>       2. Use ngme_result(..) to access different latent models.

Next we can read the result directly from the object.

ngme_out
#> *** Ngme object ***
#> 
#> Fixed effects: 
#> (Intercept)          x1          x2 
#>       -2.89       -1.33        2.02 
#> 
#> Models: 
#> $my_ar
#>   Model type: AR(1)
#>       rho = 0.565
#>   Noise type: NIG
#>   Noise parameters: 
#>       mu = 1.95
#>       sigma = 2.99
#>       nu = 0.929
#> 
#> Measurement noise: 
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 0.503

As we can see, the model converges in 350 iterations. The estimation results are close to the real parameter.

We can also use the traceplot function to see the estimation traceplot.

traceplot(ngme_out, "my_ar")

Parameters of the AR1 model

#> Last estimates:
#> $rho
#> [1] 0.5647597
#> 
#> $mu
#> [1] 1.951176
#> 
#> $sigma
#> [1] 2.989668
#> 
#> $nu
#> [1] 0.9351831

We can also do a density comparison with the estimated noise and the true NIG noise:

# ngme_out$replicates[[1]] means for the 1st replicate
plot(
  ngme_out$replicates[[1]]$models[[1]]$noise,
  noise_nig(mu = mu, sigma = sigma, nu = nu)
)

Paraná dataset

The rainfall data from Paraná (Brazil) is collected by the National Water Agency in Brazil (Agencia Nacional de Águas, ANA, in Portuguese). ANA collects data from many locations over Brazil, and all these data are freely available from the ANA website (http://www3.ana.gov.br/portal/ANA).

We will briefly illustrate the command we use, and the result of the estimation.

library(INLA)
#> Loading required package: Matrix
#> This is INLA_25.04.16 built 2025-04-16 08:17:54 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#>  - List available models/likelihoods/etc with inla.list.models()
#>  - Use inla.doc(<NAME>) to access documentation
#>  - Consider upgrading R-INLA to testing[25.06.22-1] or stable[25.06.07] (require R-4.5)
#> 
#> Attaching package: 'INLA'
#> The following object is masked from 'package:ngme2':
#> 
#>     f
data(PRprec)
data(PRborder)

# Create mesh
coords <- as.matrix(PRprec[, 1:2])
prdomain <- fmesher::fm_nonconvex_hull(coords, -0.03, -0.05, resolution = c(100, 100))
prmesh <- fmesher::fm_mesh_2d(boundary = prdomain, max.edge = c(0.45, 1), cutoff = 0.2)

# monthly mean at each location
Y <- rowMeans(PRprec[, 12 + 1:31]) # 2 + Octobor

ind <- !is.na(Y) # non-NA index
Y <- Y_mean <- Y[ind]
coords <- as.matrix(PRprec[ind, 1:2])
seaDist <- apply(spDists(coords, PRborder[1034:1078, ],
  longlat = TRUE
), 1, min)

Plot the data:

Mean of the rainfall in Octobor 2012 in Paraná

# Define the control options
control = control_opt(
  iterations = 5000,
  n_slope_check = 4,
  stop_points = 10,
  std_lim = 0.1,
  n_parallel_chain = 4,
  print_check_info = FALSE,
  seed = 16
)

m_gauss_nig <- ngme(
  formula = Y ~ 1 +
    f(seaDist, name="rw1", model = "rw1", noise = noise_normal()) +
    f(coords, model = "matern", mesh = prmesh, name="spde", noise = noise_normal()),
  data = data.frame(Y = Y),
  family = noise_nig(),
  control_opt = control
)
#> Starting estimation... 
#> 
#> Starting posterior sampling... 
#> Posterior sampling done! 
#> Note:
#>       1. Use ngme_post_samples(..) to access the posterior samples.
#>       2. Use ngme_result(..) to access different latent models.
m_gauss_nig
#> *** Ngme object ***
#> 
#> Fixed effects: 
#> (Intercept) 
#>        8.77 
#> 
#> Models: 
#> $rw1
#>   Model type: Random walk (order 1)
#>       No parameter.
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 0.155
#> 
#> $spde
#>   Model type: Matern
#>       kappa = 31.4
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 606
#> 
#> Measurement noise: 
#>   Noise type: NIG
#>   Noise parameters: 
#>       mu = 0.434
#>       sigma = 1.96
#>       nu = 1.53

# traceplots
## fixed effects and measurement error
traceplot(m_gauss_nig)

#> Last estimates:
#> $mu
#> [1] 0.4405591
#> 
#> $sigma
#> [1] 1.95535
#> 
#> $nu
#> [1] 1.541247
#> 
#> $`fixed effect 1`
#> [1] 8.771377

## spde model
traceplot(m_gauss_nig, "spde")

#> Last estimates:
#> $kappa
#> [1] 31.3967
#> 
#> $sigma
#> [1] 605.8377

Parameter estimation results:

#> Warning in data.frame(intercept = format(m_gauss_nig$replicates[[1]]$feff, :
#> row names were found from a short variable and have been discarded

Estimations for the model
intercept	noise_mu	noise_sigma	noise_nu	rw_sigma	ma_kappa	ma_sigma
8.77	0.434	1.96	1.53	-10.00	31.4	606
8.77	0.434	1.96	1.53	-1.87	31.4	606

Similarily, we can fit some different models:

m_gauss_gauss <- ngme(
  formula = Y ~ 1 +
    f(seaDist, name="rw1", model = "rw1", noise = noise_normal()) +
    f(coords, model = "matern", mesh = prmesh, name="spde", noise = noise_normal()),
  data = data.frame(Y = Y),
  family = noise_normal(),
  control_opt = control
)
#> Starting estimation... 
#> 
#> Starting posterior sampling... 
#> Posterior sampling done! 
#> Note:
#>       1. Use ngme_post_samples(..) to access the posterior samples.
#>       2. Use ngme_result(..) to access different latent models.

m_nig_gauss <- ngme(
  formula = Y ~ 1 +
    f(seaDist, name="rw1", model = "rw1", noise = noise_nig()) +
    f(coords, model = "matern", mesh = prmesh, name="spde", noise = noise_normal()),
  data = data.frame(Y = Y),
  family = noise_normal(),
  control_opt = control
)
#> Starting estimation... 
#> 
#> Starting posterior sampling... 
#> Posterior sampling done! 
#> Note:
#>       1. Use ngme_post_samples(..) to access the posterior samples.
#>       2. Use ngme_result(..) to access different latent models.

m_nig_nig <- ngme(
  formula = Y ~ 1 +
    f(seaDist, name="rw1", model = "rw1", noise = noise_nig()) +
    f(coords, model = "matern", mesh = prmesh, name="spde", noise = noise_nig()),
  data = data.frame(Y = Y),
  family = noise_nig(),
  control_opt = control
)
#> Starting estimation... 
#> 
#> Starting posterior sampling... 
#> Posterior sampling done! 
#> Note:
#>       1. Use ngme_post_samples(..) to access the posterior samples.
#>       2. Use ngme_result(..) to access different latent models.

Prediction

nxy <- c(150, 100)
projgrid <- rSPDE::rspde.mesh.projector(prmesh,
  xlim = range(PRborder[, 1]),
  ylim = range(PRborder[, 2]), dims = nxy
)

xy.in <- inout(projgrid$lattice$loc, cbind(PRborder[, 1], PRborder[, 2]))

coord.prd <- projgrid$lattice$loc[xy.in, ]
plot(coord.prd, type = "p", cex = 0.1)
lines(PRborder)
points(coords[, 1], coords[, 2], pch = 19, cex = 0.5, col = "red")


seaDist.prd <- apply(spDists(coord.prd,
  PRborder[1034:1078, ],
  longlat = TRUE
), 1, min)

# doing prediction by giving the predict location
pds <- predict(m_gauss_nig, map=list(rw1=seaDist.prd, spde=coord.prd))
lp <- pds$mean
ggplot() +
  geom_point(aes(
    x = coord.prd[, 1], y = coord.prd[, 2],
    colour = lp
  ), size = 2, alpha = 1) +
  geom_point(aes(
    x = coords[, 1], y = coords[, 2],
    colour = Y_mean
  ), size = 2, alpha = 1) +
  scale_color_gradientn(colours = viridis(100)) +
  geom_path(aes(x = PRborder[, 1], y = PRborder[, 2])) +
  geom_path(aes(x = PRborder[1034:1078, 1], y = PRborder[
    1034:1078,
    2
  ]), colour = "red")

Cross-validation

We can further validate our model by using cross-validation method.

cv <- cross_validation(
  list(
    gauss_gauss = m_gauss_gauss,
    gauss_nig = m_gauss_nig,
    nig_gauss = m_nig_gauss,
    nig_nig = m_nig_nig
  ), 
  type = "k-fold", 
  k = 10,
  n_gibbs_samples = 1000,
  n_burnin = 200,
  seed = 20
)

# Create a basic table with knitr
cv_table <- knitr::kable(cv$mean.scores, caption = "Cross-validation results")

cv_table

Cross-validation results
	MAE	MSE	neg.CRPS	neg.sCRPS
gauss_gauss	1.726098	5.025762	1.233343	1.451464
gauss_nig	1.730894	5.072879	1.237540	1.451879
nig_gauss	1.737343	5.103335	1.244486	1.455521
nig_nig	1.781279	5.634463	1.280065	1.461403

As we can see, the NIG-Gaussian model performs the best in most of the metrics.

Ngme2 - A new Flexible R Package for Latent non-Gaussian Models

2025-07-29