Introduction

Time series analysis is often used to model temporal dependencies in data, where observations at different points in time are correlated.

The autoregressive model of order 1 (AR(1)) provides a simple but widely used temporal framework that is capable of predicting a future value from the current value plus a random error term. Most AR(1) implementations traditionally assume that random errors follow a normal (Gaussian) distribution, and these implementations are typically fitted using packages like forecast or stats.

The ngme2 package extends traditional AR(1) models by allowing for non-Gaussian noise distributions, particularly the Normal Inverse Gaussian (NIG) distribution: a flexible distribution allowing for skewness and heavy tails, which can capture the asymmetric and heavy-tailed behavior commonly observed in real-world time series data. The package also supports the Generalized Asymmetric Laplace (GAL) distribution: a flexible distribution accommodating peakedness, skewness, and heavy tails.

Additionally, ngme2 enables the use of AR(1) processes as latent models within linear regression frameworks with mixed effects. This capability allows researchers to model complex hierarchical data structures where temporal dependencies exist at the latent level, while simultaneously accounting for fixed effects and other random effects. This integration makes it possible to capture both the temporal correlation structure through the AR(1) latent process and the heterogeneity across different groups or experimental units through mixed effects modeling.

This vignette provides a comprehensive guide to implementing and using AR(1) models within the ngme2 framework.

Model Specification

AR(1) Process Definition

An autoregressive model of order 1 (AR(1)) is a stochastic process where each observation depends linearly on the immediately preceding observation plus a random error term. The AR(1) process is mathematically defined as:

X1โˆผ๐’Ÿ(0,ฯƒ21โˆ’ฯ2),Xi=ฯXiโˆ’1+ฮดi,i=2,3,โ€ฆ,n,\begin{align} X_1 &\sim \mathcal{D}\left(0,\,\frac{\sigma^2}{1-\rho^2}\right), \\ X_i &= \rho X_{i-1} + \delta_i, \quad i = 2, 3, \ldots, n, \end{align}

where:

  • ๐’Ÿ(0,v)\mathcal{D}(0, v) denotes a zero-mean distribution with variance vv (e.g., Gaussian, NIG, or GAL);
  • ฯ\rho is the autoregressive parameter with |ฯ|<1|\rho| < 1 (stationarity condition);
  • ฮดiโˆผi.i.d.(0,ฯƒ2)\delta_i \sim \text{i.i.d.}(0, \sigma^2) are independent and identically distributed innovation terms with variance ฯƒ2\sigma^2;
  • nn is the total number of observations.

Note on Innovation Distributions: For non-Gaussian innovations, mean and variance do not fully specify the distribution, but are sufficient for the classical exposition of AR models:

  • For NIG innovations: ฮดi\delta_i follows a Normal Inverse Gaussian distribution with additional parameters (location, scale, asymmetry, and tail heaviness), set to ensure zero mean and variance ฯƒ2\sigma^2;
  • For GAL innovations: ฮดi\delta_i follows a Generalized Asymmetric Laplace distribution with extra parameters, also set for zero mean and variance ฯƒ2\sigma^2.

Note on Markov Property: The AR(1) process exhibits the first-order Markov property, meaning that the conditional distribution of any observation XiX_i given the entire past depends only on the immediate predecessor Xiโˆ’1X_{i-1}. Formally, for any iโ‰ฅ2i \geq 2:

P(Xiโ‰คxโˆฃX1,X2,โ€ฆ,Xiโˆ’1)=P(Xiโ‰คxโˆฃXiโˆ’1)P(X_i \leq x \mid X_1, X_2, \ldots, X_{i-1}) = P(X_i \leq x \mid X_{i-1})

This property emerges directly from the recursive structure Xi=ฯXiโˆ’1+ฮดiX_i = \rho X_{i-1} + \delta_i combined with the independence of innovations ฮดi\delta_i. Since ฮดi\delta_i is independent of all past values (X1,โ€ฆ,Xiโˆ’1)(X_1, \ldots, X_{i-1}) and XiX_i is completely determined by Xiโˆ’1X_{i-1} and ฮดi\delta_i, knowledge of the entire history provides no additional information beyond Xiโˆ’1X_{i-1} for predicting XiX_i. This Markov structure is fundamental to the computational efficiency of AR(1) models, as it implies that the precision matrix has a sparse, banded structure that enables efficient numerical algorithms.

Note on Initial Condition: The scaling factor 11โˆ’ฯ2\frac{1}{\sqrt{1-\rho^2}} in the initial condition ensures that X1X_1 has the correct marginal variance for stationarity. Specifically, Var(X1)=11โˆ’ฯ2Var(ฮด1)=ฯƒ21โˆ’ฯ2. \text{Var}(X_1) = \frac{1}{1-\rho^2} \text{Var}(\delta_1) = \frac{\sigma^2}{1-\rho^2}. This guarantees that the initial variance matches the stationary distribution of the AR(1) process, regardless of the value of ฯƒ2\sigma^2.

Properties

Stationarity Condition: The constraint |ฯ|<1|\rho| < 1 ensures that the process is stationary, meaning its statistical properties remain constant over time.

Variance Structure: In the stationary distribution, all observations have the same marginal variance Var(Xi)=ฯƒ21โˆ’ฯ2\text{Var}(X_i) = \frac{\sigma^2}{1-\rho^2}.

Autocorrelation Function: For a stationary AR(1) process, the autocorrelation function exhibits exponential decay: Corr(Xt,Xt+k)=ฯk\text{Corr}(X_t, X_{t+k}) = \rho^k where kk is the lag. This characteristic pattern distinguishes AR(1) processes from other time series models.

Partial Autocorrelation Function: The partial autocorrelation function (PACF) for an AR(1) process has a distinctive pattern: PACF(k)={ฯif k=10if k>1\text{PACF}(k) = \begin{cases} \rho & \text{if } k = 1 \\ 0 & \text{if } k > 1 \end{cases} This cutoff after lag 1 is a key diagnostic feature for identifying AR(1) processes in practice.

Noise Distributions: In ngme2, the innovation terms ฮดi\delta_i can follow either:

  • Gaussian distribution: Traditional approach with symmetric, light-tailed errors;

  • Normal Inverse Gaussian (NIG) distribution: Flexible distribution allowing for skewness and heavy tails;

  • Generalized Asymmetric Laplace (GAL) distribution: Flexible distribution accommodating peakedness, skewness, and heavy tails.

Irregular Time Grids: The ngme2 package supports AR(1) processes on irregular time grids, allowing for gaps in the time series. This is particularly useful for modeling data with missing time points or irregular sampling intervals.

Matrix Representation

The AR(1) process can be expressed in matrix form as: ๐Š๐—=๐›…,\mathbf{KX} = \boldsymbol{\delta}, where ๐—=(X1,โ€ฆ,Xn)T\mathbf{X} = (X_1,\ldots, X_n)^T, ๐›…=(ฮด1,โ€ฆ,ฮดn)T\boldsymbol{\delta} = (\delta_1, \ldots, \delta_n)^T, and ๐Š\mathbf{K} is a matrix that encodes the AR(1) structure.

The matrix ๐Š\mathbf{K} is: ๐Š=[1โˆ’ฯ200โ‹ฏ0โˆ’ฯ10โ‹ฏ00โˆ’ฯ1โ‹ฏ0โ‹ฎโ‹ฎโ‹ฑโ‹ฑโ‹ฎ00โ‹ฏโˆ’ฯ1] \mathbf{K} = \begin{bmatrix} \sqrt{1-\rho^2} & 0 & 0 & \cdots & 0 \\ -\rho & 1 & 0 & \cdots & 0 \\ 0 & -\rho & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & -\rho & 1 \end{bmatrix}

Justification of Matrix Construction: The first row with 1โˆ’ฯ2\sqrt{1-\rho^2} ensures that X1X_1 has the correct marginal variance for stationarity. The subsequent rows implement the recursive relationship Xi=ฯXiโˆ’1+ฮดiX_i = \rho X_{i-1} + \delta_i by rearranging to โˆ’ฯXiโˆ’1+Xi=ฮดi-\rho X_{i-1} + X_i = \delta_i.

The matrix ๐Š\mathbf{K} relates the correlated AR(1) process values ๐—\mathbf{X} to the independent noise terms ๐›…\boldsymbol{\delta}. This structure allows us to express the precision matrix (inverse of the covariance matrix) of the AR(1) as $\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigmaห†2$, meaning that ๐Š\mathbf{K} serves as a Cholesky-like factor of ๐\mathbf{Q}. Furthermore, the resulting precision matrix inherits the sparse structure from the Markov property of AR(1) processes, leading to significant computational advantages in terms of both memory usage and processing speed, particularly when dealing with long time series.

Precision Matrix

For the AR(1) process with the above matrix ๐Š\mathbf{K} and error variance ฯƒ2\sigma^2, the precision matrix is:

๐=๐ŠT๐Šฯƒ2=1ฯƒ2[1โˆ’ฯ2โˆ’ฯ1โˆ’ฯ20โ‹ฏ0โˆ’ฯ1โˆ’ฯ21+ฯ2โˆ’ฯโ‹ฏ00โˆ’ฯ1+ฯ2โ‹ฏ0โ‹ฎโ‹ฎโ‹ฑโ‹ฑโ‹ฎ00โ‹ฏโˆ’ฯ1] \mathbf{Q} = \frac{\mathbf{K}^T \mathbf{K}}{\sigma^2} = \frac{1}{\sigma^2}\begin{bmatrix} 1-\rho^2 & -\rho\sqrt{1-\rho^2} & 0 & \cdots & 0 \\ -\rho\sqrt{1-\rho^2} & 1+\rho^2 & -\rho & \cdots & 0 \\ 0 & -\rho & 1+\rho^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & -\rho & 1 \end{bmatrix}

Interpretation: The precision matrix ๐\mathbf{Q} encodes the conditional independence structure of the AR(1) process. The tridiagonal structure (except for the first row/column) reflects the Markov property: each observation is conditionally independent of all others given its immediate neighbors.

Matrix Relationships Summary

  • Matrix ๐Š\mathbf{K}: Maps correlated AR(1) process to independent noise (innovation form)
  • Precision matrix ๐=๐ŠT๐Š/ฯƒ2\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigma^2: Inverse of the covariance matrix
  • Covariance matrix ๐‚=๐โˆ’1\mathbf{C} = \mathbf{Q}^{-1}: Contains the covariances between all pairs of time points

This matrix formulation not only establishes a direct connection to the precision matrix ๐\mathbf{Q} by computing the innovations from the process values, but also integrates seamlessly with other ngme2 components. Factorizing the AR(1) process in this way, one gains both conceptual clarity and simplified implementation, especially in high-dimensional settings with sparse precision matrices. This approach is computationally efficient and faster than traditional methods. This is the general structure of the latent models supported by the ngme2 package.

Implementation in ngme2

Basic Syntax

To specify an AR(1) model in ngme2, use the f() function with the model="ar1" argument within your model formula: f(time_index, model="ar1"). When not specifying the noise argument in f(), the process defaults to Gaussian noise. For simulation purposes, it is important to define the sigma parameter in the noise specification, as this controls the variance of the innovations. For estimation, however, specifying sigma is optional since the model will automatically estimate this parameter from the observed data.

Basic Usage Example

When specifying the time grid, keep in mind that the AR(1) process in ngme2 requires integer time indices, although these do not need to be consecutive (gaps are allowed). This flexibility is particularly useful for modeling time series with irregular observation patterns or missing data points.

For illustration, we will create an AR(1) model using an irregular time grid with integer years. This demonstrates how ngme2 handles non-consecutive time points:

library(ngme2)
#> This is ngme2 of version 0.7.0
#> - See our homepage: https://davidbolin.github.io/ngme2 for more details.
#> 
#> Attaching package: 'ngme2'
#> The following object is masked from 'package:stats':
#> 
#>     ar
set.seed(16)

# Define time points with gaps (irregular spacing)
time_points <- c(2001, 2003, 2005, 2007)

Observe that in ngme2, it is possible to have gaps in the time grid, and the observed years can be provided in any order. For example, in this case, data for 2002, 2004, and 2006 are missing. The AR(1) model will still maintain the temporal dependence structure across these gaps. Internally, ngme2 automatically constructs the sorted, complete time grid (e.g., 2001, 2002, โ€ฆ, 2007) as needed for the precision matrix, so the user does not need to handle this manually. Consequently, the precision matrix will always have dimensions determined by the complete time grid, while an observation matrix A is used to map the observed time points to the corresponding points on the complete grid.

Now we will specify the AR(1) model with a negative autocorrelation parameter, which creates an oscillating pattern in which high values tend to be followed by low values:

# Define AR(1) model with negative autocorrelation
ar1_model <- f(time_points,
  model = "ar1",
  rho = -0.5,            # Negative correlation creates oscillating behavior
  noise = noise_normal(sigma = 1.0)    # Gaussian innovations with sigma = 1
)

The ngme2 package constructs several important matrices internally. We can examine the matrix ๐Š\mathbf{K} structure that relates the AR(1) process to the independent noise terms:

# Examine the K matrix structure
K_matrix <- ar1_model$operator$K
print(K_matrix)
#> 7 x 7 sparse Matrix of class "dgCMatrix"
#>                                     
#> [1,] 0.8660254 .   .   .   .   .   .
#> [2,] 0.5000000 1.0 .   .   .   .   .
#> [3,] .         0.5 1.0 .   .   .   .
#> [4,] .         .   0.5 1.0 .   .   .
#> [5,] .         .   .   0.5 1.0 .   .
#> [6,] .         .   .   .   0.5 1.0 .
#> [7,] .         .   .   .   .   0.5 1

The matrix ๐Š\mathbf{K} shows the characteristic AR(1) structure:

  • The first row contains 1โˆ’ฯ2=1โˆ’(โˆ’0.5)2=0.75โ‰ˆ0.866\sqrt{1-\rho^2}=\sqrt{1-(-0.5)^2}=\sqrt{0.75}\approx 0.866
  • Subsequent rows have the pattern [โˆ’ฯ,1]=[0.5,1][-\rho, 1] = [0.5, 1] representing the AR(1) recursion

The mapping matrix ๐€\mathbf{A} tells us how the internal time grid indices correspond to our specified time points:

# Check the mapping matrix A
mapping_matrix <- ar1_model$A
print(mapping_matrix)
#> 4 x 7 sparse Matrix of class "dgCMatrix"
#>                   
#> [1,] 1 0 . . . . .
#> [2,] . . 1 0 . . .
#> [3,] . . . . 1 0 .
#> [4,] . . . . . 0 1

The mapping matrix ๐€\mathbf{A} serves as a crucial bridge between the observed time indices, which were provided by the user, and the internal computational structure of the AR(1) model. When working with consecutive time indices, this matrix typically reduces to an identity matrix, providing a direct one-to-one correspondence. However, its true utility becomes apparent when dealing with irregular time grids, where it elegantly handles the mapping between the observed, user-specified, time points and the complete time grid used internally which is required for efficient computation. This mapping mechanism allows ngme2 to seamlessly accommodate missing observations, irregular sampling intervals, and complex temporal patterns without compromising the fundamental AR(1) structure.

# Load Matrix package for sparse matrix operations
library(Matrix)

# The precision matrix is computed as Q = (K^T K) / sigma^2
sigma2 <- 1.0  # Error variance from the model definition

Q_matrix <- (t(K_matrix) %*% K_matrix) / sigma2
print(Q_matrix)
#> 7 x 7 sparse Matrix of class "dgCMatrix"
#>                                      
#> [1,] 1.0 0.50 .    .    .    .    .  
#> [2,] 0.5 1.25 0.50 .    .    .    .  
#> [3,] .   0.50 1.25 0.50 .    .    .  
#> [4,] .   .    0.50 1.25 0.50 .    .  
#> [5,] .   .    .    0.50 1.25 0.50 .  
#> [6,] .   .    .    .    0.50 1.25 0.5
#> [7,] .   .    .    .    .    0.50 1.0

This matrix formulation provides several computational benefits, particularly in terms of memory efficiency and processing speed. Let us examine the sparsity of the precision matrix in the illustration above where we have 7 time grid points (year spanning from 2001 to 2007) associated to our 4 irregularly spaced observed time points:

Now let us examine the sparsity pattern of the precision matrix, which is crucial for computational efficiency:

# Check sparsity (important for computational efficiency)
Q_dense <- as.matrix(Q_matrix)
total_elements <- nrow(Q_matrix) * ncol(Q_matrix)
zero_elements <- sum(abs(Q_dense) < 1e-10)
sparsity <- zero_elements / total_elements
print(paste("Sparsity of precision matrix:", round(sparsity * 100, 1), "%"))
#> [1] "Sparsity of precision matrix: 61.2 %"

The resulting precision matrix maintains its characteristic tri-diagonal structure (with modification in the first row due to the stationarity condition), demonstrating that the temporal gaps do not compromise the fundamental sparsity properties that make AR(1) models computationally tractable. This sparsity is particularly valuable when scaling to longer time series, as it ensures that computational complexity grows linearly rather than quadratically with the number of observations.

Observe that with only 7 time grid pointsโ€”induced by our four observed time pointsโ€”the resulting precision matrix already exhibits substantial sparsity. As the number of points in the time grid increases, this sparsity becomes even more pronounced, which greatly enhances computational efficiency. This scaling property makes AR(1) models in ngme2 particularly well-suited for long time series applications.

Finally, let us demonstrate how different error variances affect the precision matrix:

# Demonstrate how different error variances affect the precision matrix
sigma2_values <- c(0.5, 2.0, 4.0)
for(sigma2_test in sigma2_values) {
  Q_scaled <- (t(K_matrix) %*% K_matrix) / sigma2_test
  print(paste("With sigma^2 =", sigma2_test, 
              ", Q[1,1] =", round(Q_scaled[1,1], 3),
              "(scaled by factor 1/", sigma2_test, ")"))
}
#> [1] "With sigma^2 = 0.5 , Q[1,1] = 2 (scaled by factor 1/ 0.5 )"
#> [1] "With sigma^2 = 2 , Q[1,1] = 0.5 (scaled by factor 1/ 2 )"
#> [1] "With sigma^2 = 4 , Q[1,1] = 0.25 (scaled by factor 1/ 4 )"

The resulting precision matrix ๐=๐ŠT๐Š/ฯƒ2\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigma^2 is sparse, which is computationally advantageous for large time series. This sparsity reflects the Markov property of AR(1) processes: each observation depends only on its immediate neighbors, not on distant past values. The scaling by 1/ฯƒ21/\sigma^2 ensures that the precision matrix correctly represents the inverse covariance structure, with larger error variances resulting in lower precision (higher uncertainty) and vice versa. When defining AR(1) models in practice, it is important to specify the error variance explicitly (e.g., noise_normal(sigma = 1.0)) to maintain consistency between the theoretical formulation and computational implementation.

Simulation

Understanding AR(1) model behavior through simulation helps build intuition about temporal dependencies and the impact of different noise distributions. The ngme2 package provides straightforward simulation capabilities that allow us to explore these characteristics systematically.

Setting Up the Simulation Environment

First, let us prepare our environment and define the basic parameters for our simulations:

library(ngme2)
library(gridExtra)
library(dplyr)
set.seed(456)

# Define simulation parameters
n_obs <- 2500
time_index <- 1:n_obs

Gaussian AR(1) Simulation

We begin with the traditional Gaussian AR(1) model, which assumes normally distributed innovations. This serves as our baseline for comparison. First, we create the AR(1) model object using the f() function, which specifies the model structure, parameters, and noise distribution. The f() function returns a model object that contains all the necessary information for simulation and estimation, including the temporal dependence structure (ฯ = 0.7) and the Gaussian noise specification (ฯƒ = 2.0).

# Define AR(1) model with Gaussian noise
ar1_gaussian <- f(time_index, 
  model = "ar1", 
  rho = 0.7,
  noise = noise_normal(sigma = 2.0)
)

In the second step, we obtain sample realizations from this model using the simulate() method, which generates random samples according to the specified AR(1) process. The simulate() method takes the model object as input along with optional arguments such as the random seed (seed = 456) for reproducibility and the number of simulations (nsim = 1). This two-step approach separates model specification from sample generation, providing flexibility for multiple simulations or parameter exploration.

# Generate realization from the AR(1) model
W_gaussian <- simulate(ar1_gaussian, seed = 456, nsim = 1)[[1]]

Let us visualize the simulated Gaussian AR(1) process and examine its temporal structure. First, let us calculate the autocorrelation and partial autocorrelation functions for the simulated Gaussian AR(1) process. These functions are essential diagnostic tools that help us verify the temporal structure of our model:

# Prepare data for ggplot
gaussian_data <- data.frame(
  time = time_index,
  value = W_gaussian
)

# Calculate autocorrelation function
acf_result <- acf(W_gaussian, lag.max = 20, plot = FALSE)
acf_data <- data.frame(
  lag = 0:20,
  acf = as.numeric(acf_result$acf)
)

# Calculate partial autocorrelation function
pacf_result <- pacf(W_gaussian, lag.max = 20, plot = FALSE)
pacf_data <- data.frame(
  lag = 1:20,
  pacf = as.numeric(pacf_result$acf)
)

Now we will visualize the simulated process and its correlation structure. The time series plot shows the temporal evolution of the process, while the ACF and PACF plots reveal the characteristic patterns that confirm our AR(1) model specification:

The autocorrelation function shows the expected exponential decay pattern characteristic of AR(1) processes, while the partial autocorrelation function exhibits the diagnostic cutoff after lag 1, confirming the model temporal structure. The PACF pattern is particularly useful for identifying the order of autoregressive processes.

NIG AR(1) Simulation

Now let us explore the flexibility of non-Gaussian innovations using the Normal Inverse Gaussian (NIG) distribution, which can capture asymmetry and heavy tails. Here we create an AR(1) model object with NIG innovations using the f() function. The NIG distribution is specified through noise_nig() with three parameters: mu = -3 (location parameter introducing asymmetry), sigma = 4 (scale parameter controlling spread), and nu = 0.4 (shape parameter governing tail heaviness).

# Define AR(1) model with NIG noise
ar1_nig <- f(time_index, 
  model = "ar1", 
  rho = 0.5,
  noise = noise_nig(mu = -3, sigma = 4, nu = 0.4)
)

We then generate a sample realization using the simulate() method with a different random seed (seed = 16) to ensure independent samples from our Gaussian example. The resulting time series will exhibit the characteristic features of NIG innovations while maintaining the AR(1) temporal correlation structure specified by ฯ=0.5\rho = 0.5.

# Generate realization from the NIG AR(1) model
W_nig <- simulate(ar1_nig, seed = 16, nsim = 1)[[1]]

We will now perform the same correlation analysis for the NIG AR(1) process. This allows us to compare how the non-Gaussian innovation distribution affects the temporal correlation structure compared to the Gaussian case:

# Prepare data for NIG process
nig_data <- data.frame(
  time = time_index,
  value = W_nig
)

# Calculate ACF for NIG process
acf_nig <- acf(W_nig, lag.max = 20, plot = FALSE)
acf_nig_data <- data.frame(
  lag = 0:20,
  acf = as.numeric(acf_nig$acf)
)

# Calculate and plot partial autocorrelation function for NIG process
pacf_nig <- pacf(W_nig, lag.max = 20, plot = FALSE)
pacf_nig_data <- data.frame(
  lag = 1:20,
  pacf = as.numeric(pacf_nig$acf)
)

Next, we visualize the NIG AR(1) process and its correlation functions. These plots will demonstrate that while the innovation distribution is non-Gaussian, the temporal correlation structure remains consistent with AR(1) theory, showing the robustness of the autoregressive framework across different noise distributions:

The NIG AR(1) process exhibits similar temporal correlation structure to the Gaussian case, as evidenced by the exponential decay in the ACF and the cutoff after lag 1 in the PACF. However, the non-Gaussian innovations introduce different marginal behavior, allowing for asymmetric and heavy-tailed distributions while preserving the fundamental AR(1) temporal dynamics.

Comparing Distributions

A key advantage of the NIG distribution is its ability to model asymmetric and heavy-tailed behavior. To better understand the differences between Gaussian and NIG innovations, we will prepare the data for a direct comparison of the marginal distributions. This analysis will highlight how the choice of noise distribution affects the overall behavior of the AR(1) process:

# Combine data for comparison
comparison_data <- data.frame(
  value = c(W_gaussian, W_nig),
  type = rep(c("Gaussian", "NIG"), each = n_obs)
)

Now we create comparative visualizations to examine the distributional differences between the two AR(1) processes. The histogram shows the overall shape and asymmetry, while the box plots highlight differences in central tendency, spread, and the presence of outliers:

The comparison reveals how the NIG distribution captures different tail behavior and potential asymmetry compared to the Gaussian case.

Estimation

Model estimation in ngme2 involves fitting AR(1) models to observed data using advanced optimization techniques. Weโ€™ll demonstrate this process using our simulated NIG data combined with fixed effects and measurement noise.

Preparing the Data

Let us create a realistic dataset with fixed effects, latent AR(1) structure, and measurement noise:

# Define fixed effects and covariates
feff <- c(-1, 2, 0.5)
x1 <- runif(n_obs)
x2 <- rexp(n_obs)
x3 <- rnorm(n_obs, mean = 1, sd = 0.5)
X <- model.matrix(~ 0 + x1 + x2 + x3)

# Create response variable with fixed effects, latent process, and measurement noise
Y <- as.numeric(X %*% feff) + W_nig + rnorm(n_obs, sd = 2)

# Prepare data frame
model_data <- data.frame(x1 = x1, x2 = x2, x3 = x3, Y = Y)

Model Specification and Fitting

We specify our model using the ngme function, incorporating both fixed effects and the latent AR(1) process. The ngme() function follows Rโ€™s standard modeling approach by accepting a formula object that defines the relationship between the response variable and both fixed and latent effects:

# Fit the model
ngme_out <- ngme(
  Y ~ 0 + x1 + x2 + x3 + f(
    1:n_obs,
    name = "my_ar",
    model = "ar1",
    noise = noise_nig(),
    control = control_f(numer_grad = TRUE, improve_hessian = FALSE)
  ),
  data = model_data,
  control_opt = control_opt(
    optimizer = adam(),
    burnin = 100,
    iterations = 1000,
    std_lim = 0.01,
    n_parallel_chain = 4,
    stop_points = 10,
    verbose = FALSE,
    print_check_info = FALSE,
    seed = 3
  )
)

The formula Y ~ 0 + x1 + x2 + x3 + f(...) specifies that the response variable Y depends on three fixed effects (x1, x2, x3) with no intercept (the 0 term), plus a latent AR(1) process defined through the f() function. The latent model is specified with f(1:n_obs, name = "my_ar", model = "ar1", noise = noise_nig()), where we assign a name to the process for later reference and specify NIG innovations without fixing the parameters (allowing them to be estimated).

The control_opt() function manages the optimization process through several key parameters:

  • optimizer = adam(): Uses the Adam optimizer, an adaptive gradient-based algorithm well-suited for complex likelihood surfaces
  • iterations = 1000: Sets the maximum number of optimization iterations
  • burnin = 100: Specifies initial iterations for algorithm stabilization before convergence assessment
  • n_parallel_chain = 4: Runs 4 parallel optimization chains to improve convergence reliability and parameter estimation robustness
  • std_lim = 0.01: Sets the convergence criterion based on parameter standard deviation across chains
  • stop_points = 10: Number of intermediate convergence checks during optimization
  • seed = 3: Ensures reproducible results across runs

This optimization setup balances computational efficiency with estimation reliability, particularly important for complex models with non-Gaussian latent processes. Let us examine the fitted model results:

# Display model summary
ngme_out
#> *** Ngme object ***
#> 
#> Fixed effects: 
#>     x1     x2     x3 
#> -0.926  2.042  0.513 
#> 
#> Models: 
#> $my_ar
#>   Model type: AR(1)
#>       rho = 0.481
#>   Noise type: NIG
#>   Noise parameters: 
#>       mu = -2.9
#>       sigma = 3.93
#>       nu = 0.409
#> 
#> Measurement noise: 
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 2.01

Convergence Diagnostics

Assessing convergence is crucial for reliable parameter estimates. We will use the built-in traceplot function from ngme2 to examine the convergence of our parameters:

  # Trace plot for sigma and fixed effects (blue lines indicate true value)
  traceplot(ngme_out, hline = c(2,-1, 2, 0.5), moving_window = 20)

#> Last estimates:
#> $sigma
#> [1] 1.981825
#> 
#> $`fixed effect 1`
#> [1] -0.9254609
#> 
#> $`fixed effect 2`
#> [1] 2.042302
#> 
#> $`fixed effect 3`
#> [1] 0.5131561

Now let us examine the convergence of the AR(1) process parameters:

  # Trace plots for AR(1) parameters (blue lines indicate true value)
  traceplot(ngme_out, "my_ar", hline = c(0.5, -3, 4, 0.4))

#> Last estimates:
#> $rho
#> [1] 0.4805299
#> 
#> $mu
#> [1] -2.899516
#> 
#> $sigma
#> [1] 3.939029
#> 
#> $nu
#> [1] 0.410303

Parameter Estimates Summary

Let us examine the final parameter estimates and compare them with the true values using the ngme_result() function:

# Parameter Estimates for Fixed Effects and Measurement Noise
params_data <- ngme_result(ngme_out, "data")

# Fixed Effects
print(data.frame(
  Truth = c(-1, 2, 0.5),
  Estimates = as.numeric(params_data$fixed_effects),
  row.names = names(params_data$fixed_effects)
))
#>    Truth  Estimates
#> x1  -1.0 -0.9258435
#> x2   2.0  2.0423156
#> x3   0.5  0.5129309

# Measurement Noise
print(data.frame(Truth = 2.0, Estimate = params_data$sigma, row.names = "sigma"))
#>       Truth Estimate
#> sigma     2 2.007037
cat("\n")

# AR(1) Process Parameters
params_ar1 <- ngme_result(ngme_out, "my_ar")

print(data.frame(
  Truth = c(rho = 0.5, mu = -3.0, sigma = 4.0, nu = 0.4),
  Estimates = unlist(params_ar1)
))
#>       Truth  Estimates
#> rho     0.5  0.4812054
#> mu     -3.0 -2.8967196
#> sigma   4.0  3.9349919
#> nu      0.4  0.4090970

Model Validation

Let us validate our model by comparing the estimated noise distribution with the true distribution:

# Distribution Validation: Fitted vs. True NIG Noise Model
with(params_ar1, {
  plot(
    noise_nig(mu = mu, sigma = sigma, nu = nu),
    noise_nig(mu = -3, sigma = 4, nu = 0.4)
  )
})

Prediction

Once we have fitted an AR(1) model using ngme2, we can use it to make predictions for new observations. The predict() method provides flexible prediction capabilities, allowing us to forecast future values, interpolate missing observations, or predict at new covariate combinations.

Hold-out Validation Procedure

Hold-out validation demonstrates the technical procedure for temporal prediction by splitting the data into training and testing portions. This approach tests the modelโ€™s ability to predict beyond the fitted time period while maintaining proper temporal ordering.

We begin by configuring the data split, reserving a small number of observations at the end of the time series for testing:

# Configuration for hold-out validation
n_holdout <- 5
n_train <- n_obs - n_holdout

# Split data maintaining temporal order
model_data_train <- model_data[1:n_train, ]
model_data_test <- model_data[(n_train+1):n_obs, ]
Y_true_test <- Y[(n_train+1):n_obs]

cat("Training:", n_train, "observations\n")
#> Training: 2495 observations
cat("Testing:", n_holdout, "observations\n")
#> Testing: 5 observations

Fitting Model on Training Data

The model is fitted using only the training portion of the data. The time index in the f() function must correspond exactly to the training data size to maintain proper temporal structure. But we also need to be careful to include the test locations in the training model. To this end, we will first create a model object with:

ar1_train <- f(
    1:n_train,
    name = "my_ar",
    model = "ar1",
    noise = noise_nig(),
    control = control_f(numer_grad = TRUE, improve_hessian = FALSE)
  )

We now need to extract the training locations, which are in ar1_train$operator$mesh$loc. Next, we create a mesh object with all training and test locations. We will need to load the fmesher package to create the mesh object.

library(fmesher)
loc_train <- ar1_train$operator$mesh$loc
loc_test <- (n_train+1):n_obs
loc_full <- c(loc_train, loc_test)
mesh_full <- fm_mesh_1d(loc_full)

Then, we fit the model, and in the f() object we include the training and test locations by passing the mesh argument.

ngme_out_train <- ngme(
  Y ~ 0 + x1 + x2 + x3 + f(
    1:n_train,
    name = "my_ar",
    model = "ar1",
    mesh = mesh_full,
    noise = noise_nig(),
    control = control_f(numer_grad = TRUE, improve_hessian = FALSE)
  ),
  data = model_data_train,
  control_opt = control_opt(
    optimizer = adam(),
    burnin = 100,
    iterations = 1000,
    std_lim = 0.01,
    n_parallel_chain = 4,
    stop_points = 10,
    verbose = FALSE,
    print_check_info = FALSE,
    seed = 3
  )
)

Making Hold-out Predictions

The predict() function requires careful specification of the map argument, which must be a named list where each name corresponds to a latent model in the fitted object. The time indices should match the prediction period, and covariate data must be supplied for the fixed effects. Note also that we set type = "lp", which stands for linear predictor. In this case, the function returns the linear predictor, that is, the sum of the fixed and random effects, thereby providing a prediction for the response variable.

predictions_holdout <- predict(
  ngme_out_train,
  map = list(my_ar = (n_train + 1):n_obs),
  data = model_data_test,
  type = "lp",
  estimator = c("mean", "sd", "0.05q", "0.95q"),
  sampling_size = 1000,
  seed = 42
)

pred_mean_holdout <- predictions_holdout$mean
pred_sd_holdout <- predictions_holdout$sd
pred_lower_holdout <- predictions_holdout$`0.05q`
pred_upper_holdout <- predictions_holdout$`0.95q`

Technical Prediction Results

The prediction results can be evaluated using standard time series forecasting metrics. These provide quantitative measures of prediction accuracy and uncertainty calibration:

pred_errors_holdout <- pred_mean_holdout - Y_true_test
rmse_holdout <- sqrt(mean(pred_errors_holdout^2))
mae_holdout <- mean(abs(pred_errors_holdout))
correlation_holdout <- cor(pred_mean_holdout, Y_true_test)
in_interval_90 <- (Y_true_test >= pred_lower_holdout) & (Y_true_test <= pred_upper_holdout)
coverage_90 <- mean(in_interval_90)

cat("RMSE:", round(rmse_holdout, 3), "\n")
#> RMSE: 5.193
cat("MAE:", round(mae_holdout, 3), "\n")
#> MAE: 5.048
cat("Correlation:", round(correlation_holdout, 3), "\n")
#> Correlation: 0.12
cat("90% Interval Coverage:", round(coverage_90 * 100, 1), "%\n")
#> 90% Interval Coverage: 100 %

print(data.frame(
  Time = (n_train + 1):n_obs,
  Observed = round(Y_true_test, 2),
  Predicted = round(pred_mean_holdout, 2),
  Lower_90 = round(pred_lower_holdout, 2),
  Upper_90 = round(pred_upper_holdout, 2)
))
#>   Time Observed Predicted Lower_90 Upper_90
#> 1 2496     8.64      2.61    -8.05     8.98
#> 2 2497     3.45      0.62   -10.78     7.47
#> 3 2498    -4.73      0.39   -10.28     6.73
#> 4 2499    -3.21      3.07    -8.08     9.81
#> 5 2500     0.94      5.94    -4.76    12.57

Visualizing Prediction Results

Visualization of the hold-out results provides insight into the temporal behavior of predictions. The plot shows the training data context, observed test values, predictions, and uncertainty intervals:

Alternative approach to hold-out validation

In this case, we know the response variable at the test data, there is an easier way to do the hold-out validation. First, we need to fit the model with the complete dataset, which we have already done and is stored in the ngme_out object.

Next, we perform prediction using the ngme_out object, and passing the indices of the training dataset in the train_idx argument:

predictions_holdout_alt <- predict(
  ngme_out,
  map = list(my_ar = (n_train + 1):n_obs),
  data = model_data_test,
  type = "lp",
  train_idx = 1:n_train,
  estimator = c("mean", "sd", "0.05q", "0.95q"),
  sampling_size = 1000,
  seed = 42
)

pred_mean_holdout_alt <- predictions_holdout_alt$mean
pred_sd_holdout_alt <- predictions_holdout_alt$sd
pred_lower_holdout_alt <- predictions_holdout_alt$`0.05q`
pred_upper_holdout_alt <- predictions_holdout_alt$`0.95q`

Let us check the prediction results. Observe the estimates are slightly different from the ones obtained above. This is due to the fact that this prediction was obtained using the fitted parameters from the complete model, thus, doing a pseudo holdout-validation.

pred_errors_holdout_alt <- pred_mean_holdout_alt - Y_true_test
rmse_holdout_alt <- sqrt(mean(pred_errors_holdout_alt^2))
mae_holdout_alt <- mean(abs(pred_errors_holdout_alt))
correlation_holdout_alt <- cor(pred_mean_holdout_alt, Y_true_test)
in_interval_90_alt <- (Y_true_test >= pred_lower_holdout_alt) & 
                      (Y_true_test <= pred_upper_holdout_alt)
coverage_90_alt <- mean(in_interval_90_alt)

cat("RMSE:", round(rmse_holdout_alt, 3), "\n")
#> RMSE: 5.118
cat("MAE:", round(mae_holdout_alt, 3), "\n")
#> MAE: 4.915
cat("Correlation:", round(correlation_holdout_alt, 3), "\n")
#> Correlation: 0.07
cat("90% Interval Coverage:", round(coverage_90_alt * 100, 1), "%\n")
#> 90% Interval Coverage: 100 %

print(data.frame(
  Time = (n_train + 1):n_obs,
  Observed = round(Y_true_test, 2),
  Predicted = round(pred_mean_holdout_alt, 2),
  Lower_90 = round(pred_lower_holdout_alt, 2),
  Upper_90 = round(pred_upper_holdout_alt, 2)
))
#>   Time Observed Predicted Lower_90 Upper_90
#> 1 2496     8.64      1.87   -11.81     8.74
#> 2 2497     3.45      0.76    -9.45     7.31
#> 3 2498    -4.73      0.14   -12.23     6.85
#> 4 2499    -3.21      2.84    -9.30     9.71
#> 5 2500     0.94      5.15    -8.50    12.71

Interpolation for Missing Values

AR(1) models are well suited for interpolating missing values because they exploit the temporal dependence between neighboring observations. This makes them especially useful when dealing with irregular sampling or short gaps caused by sensor outages in time series data.

To illustrate this, we introduce artificial gaps at different points in the series and use the AR(1) model to reconstruct the missing values:

gap1_indices <- 601:605
gap2_indices <- 1000:1005 
gap3_indices <- 2100:2106
all_gap_indices <- c(gap1_indices, gap2_indices, gap3_indices)

Y_true_gaps <- Y[all_gap_indices]
train_indices_interp <- setdiff(1:n_obs, all_gap_indices)
model_data_train_interp <- model_data[train_indices_interp, ]
Y_train_vector <- Y[train_indices_interp]

cat("Total gaps to interpolate:", length(all_gap_indices), "\n")
#> Total gaps to interpolate: 18
cat("Training observations:", length(train_indices_interp), "\n")
#> Training observations: 2482

Fitting Model for Interpolation

The interpolation model is fitted using only the non-gap observations. The time indices in the f() function correspond to the actual observation times, allowing the model to handle irregular time grids naturally:

ngme_out_interp <- ngme(
  Y_train_vector ~ 0 + x1 + x2 + x3 + f(
    train_indices_interp,
    name = "my_ar_interp",
    model = "ar1", 
    noise = noise_nig(),
    control = control_f(numer_grad = TRUE, improve_hessian = FALSE)
  ),
  data = model_data_train_interp,
  control_opt = control_opt(
    optimizer = adam(),
    burnin = 100,
    iterations = 1000,
    std_lim = 0.01,
    n_parallel_chain = 4,
    stop_points = 10,
    verbose = FALSE,
    print_check_info = FALSE,
    seed = 42
  )
)

Making Interpolation Predictions

The interpolation predictions are generated for the gap locations using the corresponding covariate values. The model leverages temporal correlation from neighboring observed points to estimate the missing values:

gap_covariates <- data.frame(
  x1 = x1[all_gap_indices],
  x2 = x2[all_gap_indices], 
  x3 = x3[all_gap_indices]
)

interpolation_pred <- predict(
  ngme_out_interp,
  map = list(my_ar_interp = all_gap_indices),
  data = gap_covariates,
  type = "lp",
  estimator = c("mean", "sd", "0.05q", "0.95q"),
  sampling_size = 1000,
  seed = 42
)

interp_mean <- interpolation_pred$mean
interp_sd <- interpolation_pred$sd  
interp_lower <- interpolation_pred[["0.05q"]]
interp_upper <- interpolation_pred[["0.95q"]]

Technical Interpolation Results

The interpolation performance can be assessed by comparing predicted values with the true (known) values at gap locations. These metrics demonstrate the effectiveness of temporal correlation for missing value estimation:

interp_errors <- interp_mean - Y_true_gaps
rmse_interp <- sqrt(mean(interp_errors^2))
mae_interp <- mean(abs(interp_errors))
correlation_interp <- cor(interp_mean, Y_true_gaps)
in_interval_90_interp <- (Y_true_gaps >= interp_lower) & (Y_true_gaps <= interp_upper)
coverage_90_interp <- mean(in_interval_90_interp)

cat("RMSE:", round(rmse_interp, 3), "\n")
#> RMSE: 7.732
cat("MAE:", round(mae_interp, 3), "\n") 
#> MAE: 3.951
cat("Correlation:", round(correlation_interp, 3), "\n")
#> Correlation: 0.224
cat("90% Interval Coverage:", round(coverage_90_interp * 100, 1), "%\n")
#> 90% Interval Coverage: 88.9 %

print(data.frame(
  Time = all_gap_indices[1:10],
  Observed = round(Y_true_gaps[1:10], 2),
  Interpolated = round(interp_mean[1:10], 2),
  Lower_90 = round(interp_lower[1:10], 2),
  Upper_90 = round(interp_upper[1:10], 2)
))
#>    Time Observed Interpolated Lower_90 Upper_90
#> 1   601     2.74        -2.58   -15.87     3.73
#> 2   602     0.82        -2.78   -14.13     4.16
#> 3   603     1.54        -0.53   -11.97     6.21
#> 4   604     1.57         1.37    -9.98     8.10
#> 5   605     6.94         2.01    -5.44     7.37
#> 6  1000     1.76         1.34    -8.53     7.44
#> 7  1001    -0.06         0.61   -10.92     7.20
#> 8  1002     0.31         0.94   -10.39     7.40
#> 9  1003     2.91         3.91    -5.49    10.15
#> 10 1004     7.54         2.70    -4.38     8.32

Visualizing Interpolation Results

The interpolation visualization shows how the model fills gaps across different time periods. Each subplot focuses on a specific gap region with surrounding temporal context to illustrate the bidirectional information flow:

Posterior Simulation

After fitting and validating our AR(1) model, we can generate posterior samples to quantify uncertainty in both the latent process and the response predictions. This Bayesian approach provides comprehensive uncertainty characterization beyond simple point estimates.

Generating Posterior Samples

The ngme2 package provides functionality to generate posterior samples through the ngme_post_samples() function. These samples represent different plausible realizations of the latent AR(1) process given the observed data and estimated parameters:

posterior_samples <- ngme_post_samples(ngme_out)

cat("Number of time points:", nrow(posterior_samples), "\n")
#> Number of time points: 2500
cat("Number of posterior samples:", ncol(posterior_samples), "\n")
#> Number of posterior samples: 100
cat("Sample names:", names(posterior_samples)[1:5], "...\n")
#> Sample names: sample_1 sample_2 sample_3 sample_4 sample_5 ...

Latent Process Posterior Analysis

We first analyze the posterior uncertainty in the latent AR(1) process (W). The posterior samples allow us to quantify how well we can recover the underlying temporal structure:

W_posterior_mean <- apply(posterior_samples, 1, mean)
W_posterior_sd <- apply(posterior_samples, 1, sd)
W_posterior_q025 <- apply(posterior_samples, 1, function(x) quantile(x, 0.025))
W_posterior_q975 <- apply(posterior_samples, 1, function(x) quantile(x, 0.975))

comparison_data <- data.frame(
  time = 1:n_obs,
  true_W = W_nig,
  posterior_mean = W_posterior_mean,
  posterior_lower = W_posterior_q025,
  posterior_upper = W_posterior_q975
)

cat("Mean absolute difference from true values:", 
    round(mean(abs(W_posterior_mean - W_nig)), 3), "\n")
#> Mean absolute difference from true values: 1.35
cat("Average posterior standard deviation:", 
    round(mean(W_posterior_sd), 3), "\n")
#> Average posterior standard deviation: 1.679

Latent Process Uncertainty Visualization

This visualization shows how well the model recovers the underlying latent AR(1) process by comparing the posterior estimates with the true latent values:

Response Variable Posterior Predictive Check

We now assess model adequacy by generating posterior predictive samples for the response variable (Y) and comparing them with the observed data:

Response Variable Prediction Intervals

Finally, we assess the calibration of our prediction intervals by examining how well the posterior predictive samples capture the observed data: