Introduction

Time series analysis is often used to model temporal dependencies in data, where observations at different points in time are correlated.

The autoregressive model of order 1 (AR(1)) provides a simple but widely used temporal framework that is capable of predicting a future value from the current value plus a random error term. Most AR(1) implementations traditionally assume that random errors follow a normal (Gaussian) distribution, and these implementations are typically fitted using packages like forecast or stats.

The ngme2 package extends traditional AR(1) models by allowing for non-Gaussian noise distributions, particularly the Normal Inverse Gaussian (NIG) distribution: a flexible distribution allowing for skewness and heavy tails, which can capture the asymmetric and heavy-tailed behavior commonly observed in real-world time series data. The package also supports the Generalized Asymmetric Laplace (GAL) distribution: a flexible distribution accommodating peakedness, skewness, and heavy tails.

Additionally, ngme2 enables the use of AR(1) processes as latent models within linear regression frameworks with mixed effects. This capability allows researchers to model complex hierarchical data structures where temporal dependencies exist at the latent level, while simultaneously accounting for fixed effects and other random effects. This integration makes it possible to capture both the temporal correlation structure through the AR(1) latent process and the heterogeneity across different groups or experimental units through mixed effects modeling.

This vignette provides a comprehensive guide to implementing and using AR(1) models within the ngme2 framework.

Model Specification

AR(1) Process Definition

An autoregressive model of order 1 (AR(1)) is a stochastic process where each observation depends linearly on the immediately preceding observation plus a random error term. The AR(1) process is mathematically defined as:

X1โˆผ๐’Ÿ(0,ฯƒ21โˆ’ฯ2),Xi=ฯXiโˆ’1+ฮดi,i=2,3,โ€ฆ,n,\begin{align} X_1 &\sim \mathcal{D}\left(0,\,\frac{\sigma^2}{1-\rho^2}\right), \\ X_i &= \rho X_{i-1} + \delta_i, \quad i = 2, 3, \ldots, n, \end{align}

where:

  • ๐’Ÿ(0,v)\mathcal{D}(0, v) denotes a zero-mean distribution with variance vv (e.g., Gaussian, NIG, or GAL);
  • ฯ\rho is the autoregressive parameter with |ฯ|<1|\rho| < 1 (stationarity condition);
  • ฮดiโˆผi.i.d.(0,ฯƒ2)\delta_i \sim \text{i.i.d.}(0, \sigma^2) are independent and identically distributed innovation terms with variance ฯƒ2\sigma^2;
  • nn is the total number of observations.

Note on Innovation Distributions: For non-Gaussian innovations, mean and variance do not fully specify the distribution, but are sufficient for the classical exposition of AR models:

  • For NIG innovations: ฮดi\delta_i follows a Normal Inverse Gaussian distribution with additional parameters (location, scale, asymmetry, and tail heaviness), set to ensure zero mean and variance ฯƒ2\sigma^2;
  • For GAL innovations: ฮดi\delta_i follows a Generalized Asymmetric Laplace distribution with extra parameters, also set for zero mean and variance ฯƒ2\sigma^2.

Note on Markov Property: The AR(1) process exhibits the first-order Markov property, meaning that the conditional distribution of any observation XiX_i given the entire past depends only on the immediate predecessor Xiโˆ’1X_{i-1}. Formally, for any iโ‰ฅ2i \geq 2:

P(Xiโ‰คxโˆฃX1,X2,โ€ฆ,Xiโˆ’1)=P(Xiโ‰คxโˆฃXiโˆ’1)P(X_i \leq x \mid X_1, X_2, \ldots, X_{i-1}) = P(X_i \leq x \mid X_{i-1})

This property emerges directly from the recursive structure Xi=ฯXiโˆ’1+ฮดiX_i = \rho X_{i-1} + \delta_i combined with the independence of innovations ฮดi\delta_i. Since ฮดi\delta_i is independent of all past values (X1,โ€ฆ,Xiโˆ’1)(X_1, \ldots, X_{i-1}) and XiX_i is completely determined by Xiโˆ’1X_{i-1} and ฮดi\delta_i, knowledge of the entire history provides no additional information beyond Xiโˆ’1X_{i-1} for predicting XiX_i. This Markov structure is fundamental to the computational efficiency of AR(1) models, as it implies that the precision matrix has a sparse, banded structure that enables efficient numerical algorithms.

Note on Initial Condition: The scaling factor 11โˆ’ฯ2\frac{1}{\sqrt{1-\rho^2}} in the initial condition ensures that X1X_1 has the correct marginal variance for stationarity. Specifically, Var(X1)=11โˆ’ฯ2Var(ฮด1)=ฯƒ21โˆ’ฯ2. \text{Var}(X_1) = \frac{1}{1-\rho^2} \text{Var}(\delta_1) = \frac{\sigma^2}{1-\rho^2}. This guarantees that the initial variance matches the stationary distribution of the AR(1) process, regardless of the value of ฯƒ2\sigma^2.

Properties

Stationarity Condition: The constraint |ฯ|<1|\rho| < 1 ensures that the process is stationary, meaning its statistical properties remain constant over time.

Variance Structure: In the stationary distribution, all observations have the same marginal variance Var(Xi)=ฯƒ21โˆ’ฯ2\text{Var}(X_i) = \frac{\sigma^2}{1-\rho^2}.

Autocorrelation Function: For a stationary AR(1) process, the autocorrelation function exhibits exponential decay: Corr(Xt,Xt+k)=ฯk\text{Corr}(X_t, X_{t+k}) = \rho^k where kk is the lag. This characteristic pattern distinguishes AR(1) processes from other time series models.

Partial Autocorrelation Function: The partial autocorrelation function (PACF) for an AR(1) process has a distinctive pattern: PACF(k)={ฯif k=10if k>1\text{PACF}(k) = \begin{cases} \rho & \text{if } k = 1 \\ 0 & \text{if } k > 1 \end{cases} This cutoff after lag 1 is a key diagnostic feature for identifying AR(1) processes in practice.

Noise Distributions: In ngme2, the innovation terms ฮดi\delta_i can follow either:

  • Gaussian distribution: Traditional approach with symmetric, light-tailed errors;

  • Normal Inverse Gaussian (NIG) distribution: Flexible distribution allowing for skewness and heavy tails;

  • Generalized Asymmetric Laplace (GAL) distribution: Flexible distribution accommodating peakedness, skewness, and heavy tails.

Irregular Time Grids: The ngme2 package supports AR(1) processes on irregular time grids, allowing for gaps in the time series. This is particularly useful for modeling data with missing time points or irregular sampling intervals.

Matrix Representation

The AR(1) process can be expressed in matrix form as: ๐Š๐—=๐›…,\mathbf{KX} = \boldsymbol{\delta}, where ๐—=(X1,โ€ฆ,Xn)T\mathbf{X} = (X_1,\ldots, X_n)^T, ๐›…=(ฮด1,โ€ฆ,ฮดn)T\boldsymbol{\delta} = (\delta_1, \ldots, \delta_n)^T, and ๐Š\mathbf{K} is a matrix that encodes the AR(1) structure.

The matrix ๐Š\mathbf{K} is: ๐Š=[1โˆ’ฯ200โ‹ฏ0โˆ’ฯ10โ‹ฏ00โˆ’ฯ1โ‹ฏ0โ‹ฎโ‹ฎโ‹ฑโ‹ฑโ‹ฎ00โ‹ฏโˆ’ฯ1] \mathbf{K} = \begin{bmatrix} \sqrt{1-\rho^2} & 0 & 0 & \cdots & 0 \\ -\rho & 1 & 0 & \cdots & 0 \\ 0 & -\rho & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & -\rho & 1 \end{bmatrix}

Justification of Matrix Construction: The first row with 1โˆ’ฯ2\sqrt{1-\rho^2} ensures that X1X_1 has the correct marginal variance for stationarity. The subsequent rows implement the recursive relationship Xi=ฯXiโˆ’1+ฮดiX_i = \rho X_{i-1} + \delta_i by rearranging to โˆ’ฯXiโˆ’1+Xi=ฮดi-\rho X_{i-1} + X_i = \delta_i.

The matrix ๐Š\mathbf{K} relates the correlated AR(1) process values ๐—\mathbf{X} to the independent noise terms ๐›…\boldsymbol{\delta}. This structure allows us to express the precision matrix (inverse of the covariance matrix) of the AR(1) as $\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigmaห†2$, meaning that ๐Š\mathbf{K} serves as a Cholesky-like factor of ๐\mathbf{Q}. Furthermore, the resulting precision matrix inherits the sparse structure from the Markov property of AR(1) processes, leading to significant computational advantages in terms of both memory usage and processing speed, particularly when dealing with long time series.

Precision Matrix

For the AR(1) process with the above matrix ๐Š\mathbf{K} and error variance ฯƒ2\sigma^2, the precision matrix is:

๐=๐ŠT๐Šฯƒ2=1ฯƒ2[1โˆ’ฯ2โˆ’ฯ1โˆ’ฯ20โ‹ฏ0โˆ’ฯ1โˆ’ฯ21+ฯ2โˆ’ฯโ‹ฏ00โˆ’ฯ1+ฯ2โ‹ฏ0โ‹ฎโ‹ฎโ‹ฑโ‹ฑโ‹ฎ00โ‹ฏโˆ’ฯ1] \mathbf{Q} = \frac{\mathbf{K}^T \mathbf{K}}{\sigma^2} = \frac{1}{\sigma^2}\begin{bmatrix} 1-\rho^2 & -\rho\sqrt{1-\rho^2} & 0 & \cdots & 0 \\ -\rho\sqrt{1-\rho^2} & 1+\rho^2 & -\rho & \cdots & 0 \\ 0 & -\rho & 1+\rho^2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \cdots & -\rho & 1 \end{bmatrix}

Interpretation: The precision matrix ๐\mathbf{Q} encodes the conditional independence structure of the AR(1) process. The tridiagonal structure (except for the first row/column) reflects the Markov property: each observation is conditionally independent of all others given its immediate neighbors.

Matrix Relationships Summary

  • Matrix ๐Š\mathbf{K}: Maps correlated AR(1) process to independent noise (innovation form)
  • Precision matrix ๐=๐ŠT๐Š/ฯƒ2\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigma^2: Inverse of the covariance matrix
  • Covariance matrix ๐‚=๐โˆ’1\mathbf{C} = \mathbf{Q}^{-1}: Contains the covariances between all pairs of time points

This matrix formulation not only establishes a direct connection to the precision matrix ๐\mathbf{Q} by computing the innovations from the process values, but also integrates seamlessly with other ngme2 components. Factorizing the AR(1) process in this way, one gains both conceptual clarity and simplified implementation, especially in high-dimensional settings with sparse precision matrices. This approach is computationally efficient and faster than traditional methods. This is the general structure of the latent models supported by the ngme2 package.

Implementation in ngme2

Basic Syntax

To specify an AR(1) model in ngme2, use the f() function with the model="ar1" argument within your model formula: f(time_index, model="ar1"). When not specifying the noise argument in f(), the process defaults to Gaussian noise. For simulation purposes, it is important to define the sigma parameter in the noise specification, as this controls the variance of the innovations. For estimation, however, specifying sigma is optional since the model will automatically estimate this parameter from the observed data.

Basic Usage Example

When specifying the time grid, keep in mind that the AR(1) process in ngme2 requires integer time indices, although these do not need to be consecutive (gaps are allowed). This flexibility is particularly useful for modeling time series with irregular observation patterns or missing data points.

For illustration, we will create an AR(1) model using an irregular time grid with integer years. This demonstrates how ngme2 handles non-consecutive time points:

library(ngme2)
#> This is ngme2 of version 0.7.0
#> - See our homepage: https://davidbolin.github.io/ngme2 for more details.
#> 
#> Attaching package: 'ngme2'
#> The following object is masked from 'package:stats':
#> 
#>     ar
set.seed(16)

# Define time points with gaps (irregular spacing)
time_points <- c(2001, 2003, 2005, 2007)

Observe that in ngme2, it is possible to have gaps in the time grid, and the observed years can be provided in any order. For example, in this case, data for 2002, 2004, and 2006 are missing. The AR(1) model will still maintain the temporal dependence structure across these gaps. Internally, ngme2 automatically constructs the sorted, complete time grid (e.g., 2001, 2002, โ€ฆ, 2007) as needed for the precision matrix, so the user does not need to handle this manually. Consequently, the precision matrix will always have dimensions determined by the complete time grid, while an observation matrix A is used to map the observed time points to the corresponding points on the complete grid.

Now we will specify the AR(1) model with a negative autocorrelation parameter, which creates an oscillating pattern in which high values tend to be followed by low values:

# Define AR(1) model with negative autocorrelation
ar1_model <- f(time_points,
  model = "ar1",
  rho = -0.5,            # Negative correlation creates oscillating behavior
  noise = noise_normal(sigma = 1.0)    # Gaussian innovations with sigma = 1
)

The ngme2 package constructs several important matrices internally. We can examine the matrix ๐Š\mathbf{K} structure that relates the AR(1) process to the independent noise terms:

# Examine the K matrix structure
K_matrix <- ar1_model$operator$K
print(K_matrix)
#> 7 x 7 sparse Matrix of class "dgCMatrix"
#>                                     
#> [1,] 0.8660254 .   .   .   .   .   .
#> [2,] 0.5000000 1.0 .   .   .   .   .
#> [3,] .         0.5 1.0 .   .   .   .
#> [4,] .         .   0.5 1.0 .   .   .
#> [5,] .         .   .   0.5 1.0 .   .
#> [6,] .         .   .   .   0.5 1.0 .
#> [7,] .         .   .   .   .   0.5 1

The matrix ๐Š\mathbf{K} shows the characteristic AR(1) structure:

  • The first row contains 1โˆ’ฯ2=1โˆ’(โˆ’0.5)2=0.75โ‰ˆ0.866\sqrt{1-\rho^2}=\sqrt{1-(-0.5)^2}=\sqrt{0.75}\approx 0.866
  • Subsequent rows have the pattern [โˆ’ฯ,1]=[0.5,1][-\rho, 1] = [0.5, 1] representing the AR(1) recursion

The mapping matrix ๐€\mathbf{A} tells us how the internal time grid indices correspond to our specified time points:

# Check the mapping matrix A
mapping_matrix <- ar1_model$A
print(mapping_matrix)
#> 4 x 7 sparse Matrix of class "dgCMatrix"
#>                   
#> [1,] 1 0 . . . . .
#> [2,] . . 1 0 . . .
#> [3,] . . . . 1 0 .
#> [4,] . . . . . 0 1

The mapping matrix ๐€\mathbf{A} serves as a crucial bridge between the observed time indices, which were provided by the user, and the internal computational structure of the AR(1) model. When working with consecutive time indices, this matrix typically reduces to an identity matrix, providing a direct one-to-one correspondence. However, its true utility becomes apparent when dealing with irregular time grids, where it elegantly handles the mapping between the observed, user-specified, time points and the complete time grid used internally which is required for efficient computation. This mapping mechanism allows ngme2 to seamlessly accommodate missing observations, irregular sampling intervals, and complex temporal patterns without compromising the fundamental AR(1) structure.

# Load Matrix package for sparse matrix operations
library(Matrix)

# The precision matrix is computed as Q = (K^T K) / sigma^2
sigma2 <- 1.0  # Error variance from the model definition

Q_matrix <- (t(K_matrix) %*% K_matrix) / sigma2
print(Q_matrix)
#> 7 x 7 sparse Matrix of class "dgCMatrix"
#>                                      
#> [1,] 1.0 0.50 .    .    .    .    .  
#> [2,] 0.5 1.25 0.50 .    .    .    .  
#> [3,] .   0.50 1.25 0.50 .    .    .  
#> [4,] .   .    0.50 1.25 0.50 .    .  
#> [5,] .   .    .    0.50 1.25 0.50 .  
#> [6,] .   .    .    .    0.50 1.25 0.5
#> [7,] .   .    .    .    .    0.50 1.0

This matrix formulation provides several computational benefits, particularly in terms of memory efficiency and processing speed. Let us examine the sparsity of the precision matrix in the illustration above where we have 7 time grid points (year spanning from 2001 to 2007) associated to our 4 irregularly spaced observed time points:

# Load required libraries
library(ggplot2)
library(reshape2)
library(viridis)
#> Loading required package: viridisLite

# Convert the precision matrix to long format for ggplot2
Q_melted <- melt(as.matrix(Q_matrix))
names(Q_melted) <- c("Row", "Column", "Value")

# Create a binary sparsity indicator
Q_melted$Sparsity <- ifelse(abs(Q_melted$Value) < 1e-10, "Zero", "Non-zero")

# Simple binary sparsity pattern 
p_sparsity_simple <- ggplot(Q_melted, aes(x = Column, y = rev(Row), fill = Sparsity)) +
  geom_tile(color = "white", linewidth = 0.1) +
  scale_fill_manual(values = c("Zero" = "white", "Non-zero" = "#2E86AB")) +
  scale_x_continuous(breaks = 1:7, expand = c(0, 0)) +
  scale_y_continuous(breaks = 1:7, labels = 7:1, expand = c(0, 0)) +
  labs(
    title = "Sparsity Pattern of AR(1) Precision Matrix",
    x = "Column Index",
    y = "Row Index",
    fill = "Elements"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 10, face = "bold"),
    axis.text = element_text(size = 7),
    axis.title = element_text(size = 8),
    legend.position = "right",
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 7),
    panel.grid = element_blank(),
    aspect.ratio = 1
  ) +
  coord_fixed()

print(p_sparsity_simple)   

Now let us examine the sparsity pattern of the precision matrix, which is crucial for computational efficiency:

# Check sparsity (important for computational efficiency)
Q_dense <- as.matrix(Q_matrix)
total_elements <- nrow(Q_matrix) * ncol(Q_matrix)
zero_elements <- sum(abs(Q_dense) < 1e-10)
sparsity <- zero_elements / total_elements
print(paste("Sparsity of precision matrix:", round(sparsity * 100, 1), "%"))
#> [1] "Sparsity of precision matrix: 61.2 %"

The resulting precision matrix maintains its characteristic tri-diagonal structure (with modification in the first row due to the stationarity condition), demonstrating that the temporal gaps do not compromise the fundamental sparsity properties that make AR(1) models computationally tractable. This sparsity is particularly valuable when scaling to longer time series, as it ensures that computational complexity grows linearly rather than quadratically with the number of observations.

Observe that with only 7 time grid pointsโ€”induced by our four observed time pointsโ€”the resulting precision matrix already exhibits substantial sparsity. As the number of points in the time grid increases, this sparsity becomes even more pronounced, which greatly enhances computational efficiency. This scaling property makes AR(1) models in ngme2 particularly well-suited for long time series applications.

Finally, let us demonstrate how different error variances affect the precision matrix:

# Demonstrate how different error variances affect the precision matrix
sigma2_values <- c(0.5, 2.0, 4.0)
for(sigma2_test in sigma2_values) {
  Q_scaled <- (t(K_matrix) %*% K_matrix) / sigma2_test
  print(paste("With sigma^2 =", sigma2_test, 
              ", Q[1,1] =", round(Q_scaled[1,1], 3),
              "(scaled by factor 1/", sigma2_test, ")"))
}
#> [1] "With sigma^2 = 0.5 , Q[1,1] = 2 (scaled by factor 1/ 0.5 )"
#> [1] "With sigma^2 = 2 , Q[1,1] = 0.5 (scaled by factor 1/ 2 )"
#> [1] "With sigma^2 = 4 , Q[1,1] = 0.25 (scaled by factor 1/ 4 )"

The resulting precision matrix ๐=๐ŠT๐Š/ฯƒ2\mathbf{Q} = \mathbf{K}^T \mathbf{K}/\sigma^2 is sparse, which is computationally advantageous for large time series. This sparsity reflects the Markov property of AR(1) processes: each observation depends only on its immediate neighbors, not on distant past values. The scaling by 1/ฯƒ21/\sigma^2 ensures that the precision matrix correctly represents the inverse covariance structure, with larger error variances resulting in lower precision (higher uncertainty) and vice versa. When defining AR(1) models in practice, it is important to specify the error variance explicitly (e.g., noise_normal(sigma = 1.0)) to maintain consistency between the theoretical formulation and computational implementation.

Simulation

Understanding AR(1) model behavior through simulation helps build intuition about temporal dependencies and the impact of different noise distributions. The ngme2 package provides straightforward simulation capabilities that allow us to explore these characteristics systematically.

Setting Up the Simulation Environment

First, let us prepare our environment and define the basic parameters for our simulations:

library(ngme2)
library(gridExtra)
library(dplyr)
set.seed(456)

# Define simulation parameters
n_obs <- 2500
time_index <- 1:n_obs

Gaussian AR(1) Simulation

We begin with the traditional Gaussian AR(1) model, which assumes normally distributed innovations. This serves as our baseline for comparison. First, we create the AR(1) model object using the f() function, which specifies the model structure, parameters, and noise distribution. The f() function returns a model object that contains all the necessary information for simulation and estimation, including the temporal dependence structure (ฯ = 0.7) and the Gaussian noise specification (ฯƒ = 2.0).

# Define AR(1) model with Gaussian noise
ar1_gaussian <- f(time_index, 
  model = "ar1", 
  rho = 0.7,
  noise = noise_normal(sigma = 2.0)
)

In the second step, we obtain sample realizations from this model using the simulate() method, which generates random samples according to the specified AR(1) process. The simulate() method takes the model object as input along with optional arguments such as the random seed (seed = 456) for reproducibility and the number of simulations (nsim = 1). This two-step approach separates model specification from sample generation, providing flexibility for multiple simulations or parameter exploration.

# Generate realization from the AR(1) model
W_gaussian <- simulate(ar1_gaussian, seed = 456, nsim = 1)[[1]]

Let us visualize the simulated Gaussian AR(1) process and examine its temporal structure. First, let us calculate the autocorrelation and partial autocorrelation functions for the simulated Gaussian AR(1) process. These functions are essential diagnostic tools that help us verify the temporal structure of our model:

# Prepare data for ggplot
gaussian_data <- data.frame(
  time = time_index,
  value = W_gaussian
)

# Calculate autocorrelation function
acf_result <- acf(W_gaussian, lag.max = 20, plot = FALSE)
acf_data <- data.frame(
  lag = 0:20,
  acf = as.numeric(acf_result$acf)
)

# Calculate partial autocorrelation function
pacf_result <- pacf(W_gaussian, lag.max = 20, plot = FALSE)
pacf_data <- data.frame(
  lag = 1:20,
  pacf = as.numeric(pacf_result$acf)
)

Now we will visualize the simulated process and its correlation structure. The time series plot shows the temporal evolution of the process, while the ACF and PACF plots reveal the characteristic patterns that confirm our AR(1) model specification:

# Create time series plot
p1 <- ggplot(gaussian_data, aes(x = time, y = value)) +
  geom_line(color = "#2E86AB", alpha = 0.8, linewidth = 0.3) +
  labs(
    title = "AR(1) Process with Gaussian Noise",
    subtitle = expression(paste(rho, " = 0.7, ", sigma, " = 2.0")),
    x = "Time",
    y = "Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    plot.subtitle = element_text(size = 10),
    panel.grid.minor = element_blank()
  )

# Plot autocorrelation function
p2 <- ggplot(acf_data, aes(x = lag, y = acf)) +
  geom_hline(yintercept = 0, color = "gray50") +
  geom_hline(yintercept = c(-0.2, 0.2), color = "blue", linetype = "dashed", alpha = 0.4, linewidth = 0.4) +
  geom_segment(aes(xend = lag, yend = 0), color = "#2E86AB", linewidth = 0.4) +
  geom_point(color = "#2E86AB", size = 1) +
  labs(
    title = "Sample Autocorrelation Function",
    x = "Lag",
    y = "ACF"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),   
    panel.grid.minor = element_blank()
  )

# Plot partial autocorrelation function
p3 <- ggplot(pacf_data, aes(x = lag, y = pacf)) +
  geom_hline(yintercept = 0, color = "gray50") +
  geom_hline(yintercept = c(-0.2, 0.2), color = "blue", linetype = "dashed", alpha = 0.4, linewidth = 0.4) +
  geom_segment(aes(xend = lag, yend = 0), color = "#2E86AB", linewidth = 0.4) +
  geom_point(color = "#2E86AB", size = 1) +
  labs(
    title = "Sample Partial Autocorrelation Function (PACF)",
    x = "Lag",
    y = "PACF"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),   
    panel.grid.minor = element_blank()
  )

# Combine plots
grid.arrange(p1, p2, p3, ncol = 1)

The autocorrelation function shows the expected exponential decay pattern characteristic of AR(1) processes, while the partial autocorrelation function exhibits the diagnostic cutoff after lag 1, confirming the model temporal structure. The PACF pattern is particularly useful for identifying the order of autoregressive processes.

NIG AR(1) Simulation

Now let us explore the flexibility of non-Gaussian innovations using the Normal Inverse Gaussian (NIG) distribution, which can capture asymmetry and heavy tails. Here we create an AR(1) model object with NIG innovations using the f() function. The NIG distribution is specified through noise_nig() with three parameters: mu = -3 (location parameter introducing asymmetry), sigma = 4 (scale parameter controlling spread), and nu = 0.4 (shape parameter governing tail heaviness).

# Define AR(1) model with NIG noise
ar1_nig <- f(time_index, 
  model = "ar1", 
  rho = 0.5,
  noise = noise_nig(mu = -3, sigma = 4, nu = 0.4)
)

We then generate a sample realization using the simulate() method with a different random seed (seed = 16) to ensure independent samples from our Gaussian example. The resulting time series will exhibit the characteristic features of NIG innovations while maintaining the AR(1) temporal correlation structure specified by ฯ=0.5\rho = 0.5.

# Generate realization from the NIG AR(1) model
W_nig <- simulate(ar1_nig, seed = 16, nsim = 1)[[1]]

We will now perform the same correlation analysis for the NIG AR(1) process. This allows us to compare how the non-Gaussian innovation distribution affects the temporal correlation structure compared to the Gaussian case:

# Prepare data for NIG process
nig_data <- data.frame(
  time = time_index,
  value = W_nig
)

# Calculate ACF for NIG process
acf_nig <- acf(W_nig, lag.max = 20, plot = FALSE)
acf_nig_data <- data.frame(
  lag = 0:20,
  acf = as.numeric(acf_nig$acf)
)

# Calculate and plot partial autocorrelation function for NIG process
pacf_nig <- pacf(W_nig, lag.max = 20, plot = FALSE)
pacf_nig_data <- data.frame(
  lag = 1:20,
  pacf = as.numeric(pacf_nig$acf)
)

Next, we visualize the NIG AR(1) process and its correlation functions. These plots will demonstrate that while the innovation distribution is non-Gaussian, the temporal correlation structure remains consistent with AR(1) theory, showing the robustness of the autoregressive framework across different noise distributions:

# Create time series plot for NIG
p4 <- ggplot(nig_data, aes(x = time, y = value)) +
  geom_line(color = "#A23B72", alpha = 0.8, linewidth = 0.3) +
  labs(
    title = "AR(1) Process with NIG Noise",
    subtitle = expression(paste(rho, " = 0.5, NIG(", mu, " = -3, ", sigma, " = 4, ", nu, " = 0.4)")),
    x = "Time",
    y = "Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    plot.subtitle = element_text(size = 10),
    panel.grid.minor = element_blank()
  )

# Plot autocorrelation function for NIG process
p5 <- ggplot(acf_nig_data, aes(x = lag, y = acf)) +
  geom_hline(yintercept = 0, color = "gray50") +
  geom_hline(yintercept = c(-0.2, 0.2), color = "blue", linetype = "dashed", alpha = 0.4, linewidth = 0.4) +
  geom_segment(aes(xend = lag, yend = 0), color = "#A23B72", linewidth = 0.4) +
  geom_point(color = "#A23B72", size = 1) +
  labs(
    title = "Sample Autocorrelation Function (NIG)",
    x = "Lag",
    y = "ACF"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    panel.grid.minor = element_blank()
  )

# Plot partial autocorrelation function for NIG process
p6 <- ggplot(pacf_nig_data, aes(x = lag, y = pacf)) +
  geom_hline(yintercept = 0, color = "gray50") +
  geom_hline(yintercept = c(-0.2, 0.2), color = "blue", linetype = "dashed", alpha = 0.4, linewidth = 0.4) +
  geom_segment(aes(xend = lag, yend = 0), color = "#A23B72", linewidth = 0.4) +
  geom_point(color = "#A23B72", size = 1) +
  labs(
    title = "Sample Partial Autocorrelation Function (PACF)",
    x = "Lag",
    y = "PACF"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    panel.grid.minor = element_blank()
  )

# Combine plots
grid.arrange(p4, p5, p6, ncol = 1)

The NIG AR(1) process exhibits similar temporal correlation structure to the Gaussian case, as evidenced by the exponential decay in the ACF and the cutoff after lag 1 in the PACF. However, the non-Gaussian innovations introduce different marginal behavior, allowing for asymmetric and heavy-tailed distributions while preserving the fundamental AR(1) temporal dynamics.

Comparing Distributions

A key advantage of the NIG distribution is its ability to model asymmetric and heavy-tailed behavior. To better understand the differences between Gaussian and NIG innovations, we will prepare the data for a direct comparison of the marginal distributions. This analysis will highlight how the choice of noise distribution affects the overall behavior of the AR(1) process:

# Combine data for comparison
comparison_data <- data.frame(
  value = c(W_gaussian, W_nig),
  type = rep(c("Gaussian", "NIG"), each = n_obs)
)

Now we create comparative visualizations to examine the distributional differences between the two AR(1) processes. The histogram shows the overall shape and asymmetry, while the box plots highlight differences in central tendency, spread, and the presence of outliers:

# Create comparison plots
p7 <- ggplot(comparison_data, aes(x = value, fill = type)) +
  geom_histogram(alpha = 0.6, bins = 50, position = "identity") +
  scale_fill_manual(values = c("Gaussian" = "#2E86AB", "NIG" = "#A23B72")) +
  labs(
    title = "Distribution Comparison",
    x = "Value",
    y = "Frequency",
    fill = "Noise Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    legend.position = "top"
  )

# Box plot comparison
p8 <- ggplot(comparison_data, aes(x = type, y = value, fill = type)) +
  geom_boxplot(alpha = 0.6, outlier.alpha = 0.5) +
  scale_fill_manual(values = c("Gaussian" = "#2E86AB", "NIG" = "#A23B72")) +
  labs(
    title = "Distribution Comparison (Box Plots)",
    x = "Noise Type",
    y = "Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12, face = "bold"),
    legend.position = "none"
  )

grid.arrange(p7, p8, ncol = 2)

The comparison reveals how the NIG distribution captures different tail behavior and potential asymmetry compared to the Gaussian case.

Estimation

Model estimation in ngme2 involves fitting AR(1) models to observed data using advanced optimization techniques. Weโ€™ll demonstrate this process using our simulated NIG data combined with fixed effects and measurement noise.

Preparing the Data

Let us create a realistic dataset with fixed effects, latent AR(1) structure, and measurement noise:

# Define fixed effects and covariates
feff <- c(-1, 2, 0.5)
x1 <- runif(n_obs)
x2 <- rexp(n_obs)
x3 <- rnorm(n_obs, mean = 1, sd = 0.5)
X <- model.matrix(~ 0 + x1 + x2 + x3)

# Create response variable with fixed effects, latent process, and measurement noise
Y <- as.numeric(X %*% feff) + W_nig + rnorm(n_obs, sd = 2)

# Prepare data frame
model_data <- data.frame(x1 = x1, x2 = x2, x3 = x3, Y = Y)

Model Specification and Fitting

We specify our model using the ngme function, incorporating both fixed effects and the latent AR(1) process. The ngme() function follows Rโ€™s standard modeling approach by accepting a formula object that defines the relationship between the response variable and both fixed and latent effects:

# Fit the model
ngme_out <- ngme(
  Y ~ 0 + x1 + x2 + x3 + f(
    1:n_obs,
    name = "my_ar",
    model = "ar1",
    noise = noise_nig(),
    control = control_f(numer_grad = TRUE, improve_hessian = FALSE)
  ),
  data = model_data,
  control_opt = control_opt(
    optimizer = adam(),
    burnin = 100,
    iterations = 1000,
    std_lim = 0.01,
    n_parallel_chain = 4,
    stop_points = 10,
    verbose = FALSE,
    print_check_info = FALSE,
    seed = 3
  )
)

The formula Y ~ 0 + x1 + x2 + x3 + f(...) specifies that the response variable Y depends on three fixed effects (x1, x2, x3) with no intercept (the 0 term), plus a latent AR(1) process defined through the f() function. The latent model is specified with f(1:n_obs, name = "my_ar", model = "ar1", noise = noise_nig()), where we assign a name to the process for later reference and specify NIG innovations without fixing the parameters (allowing them to be estimated).

The control_opt() function manages the optimization process through several key parameters:

  • optimizer = adam(): Uses the Adam optimizer, an adaptive gradient-based algorithm well-suited for complex likelihood surfaces
  • iterations = 1000: Sets the maximum number of optimization iterations
  • burnin = 100: Specifies initial iterations for algorithm stabilization before convergence assessment
  • n_parallel_chain = 4: Runs 4 parallel optimization chains to improve convergence reliability and parameter estimation robustness
  • std_lim = 0.01: Sets the convergence criterion based on parameter standard deviation across chains
  • stop_points = 10: Number of intermediate convergence checks during optimization
  • seed = 3: Ensures reproducible results across runs

This optimization setup balances computational efficiency with estimation reliability, particularly important for complex models with non-Gaussian latent processes. Let us examine the fitted model results:

# Display model summary
ngme_out
#> *** Ngme object ***
#> 
#> Fixed effects: 
#>     x1     x2     x3 
#> -0.928  2.043  0.519 
#> 
#> Models: 
#> $my_ar
#>   Model type: AR(1)
#>       rho = 0.483
#>   Noise type: NIG
#>   Noise parameters: 
#>       mu = -2.87
#>       sigma = 3.89
#>       nu = 0.407
#> 
#> Measurement noise: 
#>   Noise type: NORMAL
#>   Noise parameters: 
#>       sigma = 2.03

Convergence Diagnostics

Assessing convergence is crucial for reliable parameter estimates. We will use the built-in traceplot function from ngme2 to examine the convergence of our parameters:

# Trace plot for sigma and fixed effects (blue lines indicate true value)
traceplot(ngme_out, hline = c(2,-1, 2, 0.5), moving_window = 20)

#> Last estimates:
#> $sigma
#> [1] 2.016504
#> 
#> $`fixed effect 1`
#> [1] -0.9287274
#> 
#> $`fixed effect 2`
#> [1] 2.042297
#> 
#> $`fixed effect 3`
#> [1] 0.5200857

Now let us examine the convergence of the AR(1) process parameters:

# Trace plots for AR(1) parameters (blue lines indicate true value)
traceplot(ngme_out, "my_ar", hline = c(0.5, -3, 4, 0.4))

#> Last estimates:
#> $rho
#> [1] 0.4833267
#> 
#> $mu
#> [1] -2.871614
#> 
#> $sigma
#> [1] 3.893367
#> 
#> $nu
#> [1] 0.4076381

Parameter Estimates Summary

Let us examine the final parameter estimates and compare them with the true values using the ngme_result() function:

# Parameter Estimates for Fixed Effects and Measurement Noise
params_data <- ngme_result(ngme_out, "data")

# Fixed Effects
print(data.frame(
  Truth = c(-1, 2, 0.5),
  Estimates = as.numeric(params_data$fixed_effects),
  row.names = names(params_data$fixed_effects)
))
#>    Truth  Estimates
#> x1  -1.0 -0.9276886
#> x2   2.0  2.0425700
#> x3   0.5  0.5188674

# Measurement Noise
print(data.frame(Truth = 2.0, Estimate = params_data$sigma, row.names = "sigma"))
#>       Truth Estimate
#> sigma     2 2.027015
cat("\n")

# AR(1) Process Parameters
params_ar1 <- ngme_result(ngme_out, "my_ar")

print(data.frame(
  Truth = c(rho = 0.5, mu = -3.0, sigma = 4.0, nu = 0.4),
  Estimates = unlist(params_ar1)
))
#>       Truth  Estimates
#> rho     0.5  0.4832092
#> mu     -3.0 -2.8699542
#> sigma   4.0  3.8940500
#> nu      0.4  0.4069202

Model Validation

Finally, let us validate our model by comparing the estimated noise distribution with the true distribution by plotting the estimated noise distribution against the true noise distribution:

# Distribution Validation: Fitted vs. True NIG Noise Model
with(params_ar1, {
  plot(
    noise_nig(mu = mu, sigma = sigma, nu = nu),
    noise_nig(mu = -3, sigma = 4, nu = 0.4)
  )
})

Summary

The estimation results demonstrate the effectiveness of the ngme2 framework for fitting complex AR(1) models with non-Gaussian innovations. Key findings include:

  • Parameter Recovery: The estimated parameters closely match the true values used in simulation
  • Convergence: The trace plots show good mixing and convergence behavior
  • Distribution Fit: The estimated NIG distribution captures the true noise characteristics

This validation confirms that the estimation procedure successfully recovered the underlying model parameters, providing confidence in the methodology for real-world applications where the true parameters are unknown.