Skip to contents

Introduction

Cross-validation is a fundamental tool for model selection and assessment, allowing us to evaluate how well different model specifications generalize to unseen data. In the context of spatial and temporal models, proper cross-validation requires careful attention to the structure of dependence in the data.

This vignette demonstrates cross-validation approaches for both:

  1. Time series models: where temporal ordering must be respected, comparing Ornstein-Uhlenbeck and Whittle-Matérn processes with NIG noise.
  2. Spatial models: where the spatial correlation structure must be preserved, using SPDE Matérn models with NIG noise and spatial scales governed either by a stationary range parameter or by a nonstationary, spatially varying range parameter.

We use the ngme2 package’s flexible cross-validation framework, which supports custom splitting strategies that honor the dependence structure inherent in spatial and temporal data. A key feature is the use of the Whittle-Matérn SPDE framework with non-Gaussian Type-G Lévy noise (NIG, GAL), which provides a unified approach to modeling both temporal (1D) and spatial (2D) correlation structures with heavy tails, skewness, and jumps through stochastic partial differential equations.

Time Series Cross-validation

Temporal Process Models

The ngme2 package supports several temporal process models with non-Gaussian Type-G Lévy noise. In this vignette, we compare two approaches:

Ornstein-Uhlenbeck (OU) Process: A continuous-time mean-reverting process driven by the SDE dX(t)=θX(t)dt+σd(t)dX(t) = -\theta X(t)dt + \sigma d\mathcal{M}(t), where \mathcal{M} is a Type-G Lévy process (NIG, GAL). See the OU vignette for details.

Whittle-Matérn Process: Based on the SPDE (κ2d2dt2)α/2X(t)=̇(t)(\kappa^2 - \frac{d^2}{dt^2})^{\alpha/2} X(t) = \dot{\mathcal{M}}(t) on a bounded domain, where ̇\dot{\mathcal{M}} is Type-G Lévy noise. For α=2\alpha = 2 in 1D, this approximates a Matérn covariance structure with smoothness ν=1.5\nu = 1.5. See the SPDE vignette for details.

Key distinction: The OU process corresponds to ν=0.5\nu = 0.5 (rougher paths), while the Whittle-Matérn with α=2\alpha=2 corresponds to ν=1.5\nu = 1.5 (smoother paths). Both are driven by non-Gaussian NIG noise in this example.

For additional time series models available in ngme2, see the AR1 model and Random Walk model vignettes.

Generating Simulated OU Data

We begin by simulating a time series from an OU process with NIG noise.

set.seed(123)

# Define OU parameters
n_obs <- 200
theta_true <- 0.15 # Mean-reversion parameter

# NIG noise parameters
mu_true <- 0.5
sigma_true <- 2.0
nu_true <- 1.0

# Create mesh for time series (required for SPDE models)
time_mesh <- fmesher::fm_mesh_1d(
  loc = 1:n_obs,
  boundary = "free"
)

# Create OU model specification for simulation with NIG noise
ou_spec <- f(
  map = 1:n_obs,
  model = ou(theta = theta_true),
  noise = noise_nig(mu = mu_true, sigma = sigma_true, nu = nu_true)
)

# Simulate the latent OU process
sim_ou <- simulate(ou_spec, seed = 123)[[1]]

# Add measurement noise to create observations
measurement_sd <- 0.5
y <- sim_ou + rnorm(n_obs, mean = 0, sd = measurement_sd)

Now we visualize the simulated time series:

# Prepare visualization data
plot_data <- data.frame(
  time = 1:n_obs,
  observed = y,
  latent = sim_ou
)

ggplot(plot_data, aes(x = time)) +
  geom_line(aes(y = latent, color = "True Latent OU Process"),
    linewidth = 0.6, alpha = 0.7
  ) +
  geom_line(aes(y = observed, color = "Observed Data"),
    linewidth = 0.4, alpha = 0.8
  ) +
  scale_color_manual(values = c(
    "True Latent OU Process" = "#2E86AB",
    "Observed Data" = "#A23B72"
  )) +
  labs(
    title = "Simulated OU Time Series with NIG Noise",
    subtitle = sprintf(
      "θ = %.2f, NIG(μ=%.1f, σ=%.1f, ν=%.1f)",
      theta_true, mu_true, sigma_true, nu_true
    ),
    x = "Time",
    y = "Value",
    color = ""
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11),
    legend.position = "top"
  )

The make_time_series_cv_index() Function

The ngme2 package provides the make_time_series_cv_index() function to create training and testing indices that respect temporal ordering. This function is essential for proper time series cross-validation.

Function Arguments

The main arguments of make_time_series_cv_index() are:

  • time_idx: A numeric vector of time indices in ascending order (e.g., 1:n_obs)
  • train_length: Fixed length of training sets
    • If NULL (default): uses expanding window approach (training set grows)
    • If specified (e.g., 50): uses sliding window approach (fixed training size)
  • test_length: Number of observations in each test set (forecast horizon). Default is 1
  • gap: Gap between training and test sets. Default is 0 (test starts immediately after training)
  • replicate: Optional vector for grouping observations that should stay together

Understanding Expanding vs Sliding Windows

Expanding Window (train_length = NULL):
The training set continuously grows, incorporating all available historical data. Each fold trains on all past data up to that point. This approach is appropriate when all past observations are equally relevant for prediction.

Sliding Window (train_length = fixed value):
The training window size remains constant at the specified value. As time progresses, old observations are dropped and new ones are added. This is useful when recent data is more relevant than distant past observations, or when the data-generating process may change over time.

Creating Cross-validation Indices

We create indices for both expanding and sliding window approaches:

# Create indices using the EXPANDING WINDOW approach
cv_indices_expanding <- make_time_series_cv_index(
  time_idx = 1:n_obs, # Time indices: 1 to 1000
  train_length = NULL, # NULL = expanding window
  test_length = 10, # Predict 10 steps ahead in each fold
  gap = 0 # No gap between train and test
)

# Subsetting to have at least 150 observations in training test
min_obs_train <- 150
total_train_test <- length(cv_indices_expanding$train)

cv_indices_expanding$train <-
  cv_indices_expanding$train[min_obs_train:total_train_test]
cv_indices_expanding$test <-
  cv_indices_expanding$test[min_obs_train:total_train_test]

# Create indices using the SLIDING WINDOW approach for comparison
cv_indices_sliding <- make_time_series_cv_index(
  time_idx = 1:n_obs,
  train_length = 50, # Fixed training window of 50 observations
  test_length = 10,
  gap = 0
)

# Subsetting to only consider the last 100 splits
min_idx_train <- 100
total_train_test <- length(cv_indices_sliding$train)

cv_indices_sliding$train <-
  cv_indices_sliding$train[min_idx_train:total_train_test]
cv_indices_sliding$test <-
  cv_indices_sliding$test[min_idx_train:total_train_test]

# We'll use the expanding window for our analysis
cv_indices <- cv_indices_expanding

The structure of the created folds is summarized below:

Comparison of expanding vs sliding window cross-validation
Approach Characteristic Value
Expanding Window Number of folds 41
Expanding Window Fold 3 training size 152
Sliding Window Number of folds 42
Sliding Window Fold 3 training size 50

Notice how the expanding window accumulates more data with each fold, while the sliding window maintains a constant training set size. The expanding window approach typically provides more stable predictions as it uses all available historical information.

Visualizing Training and Testing Sets

To understand how time series cross-validation works, let’s visualize the training (blue) and testing (orange) sets for each fold:

# Create visualization data frame
folds_viz <- data.frame()

for (i in 1:length(cv_indices$train)) {
  # Add training points
  train_df <- data.frame(
    time = cv_indices$train[[i]],
    fold = i,
    set_type = "Training"
  )

  # Add testing points
  test_df <- data.frame(
    time = cv_indices$test[[i]],
    fold = i,
    set_type = "Testing"
  )

  folds_viz <- rbind(folds_viz, train_df, test_df)
}

# Create visualization
ggplot(folds_viz, aes(x = time, y = factor(fold), color = set_type)) +
  geom_point(size = 1.5, alpha = 0.7, shape = 15) +
  scale_color_manual(
    values = c("Training" = "#2E86AB", "Testing" = "#F18F01"),
    name = "Set Type"
  ) +
  labs(
    title = "Time Series Cross-validation: Expanding Window Approach",
    subtitle = "Training set expands over time, always predicting future observations",
    x = "Time Index",
    y = "Fold Number"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11),
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

This visualization clearly shows the fundamental principle: training data always precedes testing data, mimicking real forecasting scenarios.

For comparison, here is the sliding window approach:

# Build visualization data for the sliding window
folds_viz_sliding <- data.frame()

for (i in 1:length(cv_indices_sliding$train)) {
  # Training
  train_df <- data.frame(
    time = cv_indices_sliding$train[[i]],
    fold = i,
    set_type = "Training"
  )

  # Testing
  test_df <- data.frame(
    time = cv_indices_sliding$test[[i]],
    fold = i,
    set_type = "Testing"
  )

  folds_viz_sliding <- rbind(folds_viz_sliding, train_df, test_df)
}

# Plot sliding window
ggplot(folds_viz_sliding, aes(x = time, y = factor(fold), color = set_type)) +
  geom_point(size = 1.5, alpha = 0.7, shape = 15) +
  scale_color_manual(
    values = c("Training" = "#2E86AB", "Testing" = "#F18F01"),
    name = "Set Type"
  ) +
  labs(
    title = "Time Series Cross-validation: Sliding Window Approach",
    subtitle = "Training set has fixed width, model always trained on the most recent history",
    x = "Time Index",
    y = "Fold Number"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11),
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

Fitting Competing Models

We compare two temporal process models, both with NIG noise:

Model 1: OU Process (True Model)

# Prepare data
data_df <- data.frame(y = y, time_idx = 1:n_obs)

# Fit OU model with NIG noise (the true model)
fit_ou <- ngme(
  y ~ 0 + f(time_idx, model = ou(), noise = noise_nig()),
  data = data_df,
  control_opt = control_opt(
    iterations = 2000,
    n_parallel_chain = 4,
    std_lim = 0.01,
    seed = 123
  )
)

Model 2: Whittle-Matérn with α = 2

# Fit Whittle-Matérn model with alpha = 2 and NIG noise
fit_matern <- ngme(
  y ~ 0 + f(
    time_idx,
    model = matern(
      mesh = time_mesh,
      alpha = 2
    ),
    noise = noise_nig()
  ),
  data = data_df,
  control_opt = control_opt(
    iterations = 2000,
    n_parallel_chain = 4,
    std_lim = 0.01,
    seed = 456
  )
)

Performing Time Series Cross-validation

Now we perform cross-validation using the indices from make_time_series_cv_index(). For each fold, both models are fitted on the training set and evaluated on the test set:

# Perform cross-validation
cv_results <- cross_validation(
  list(
    "OU Process (ν=0.5)" = fit_ou,
    "Whittle-Matérn (ν=1.5)" = fit_matern
  ),
  type = "custom",
  train_idx = cv_indices$train,
  test_idx = cv_indices$test,
  N_sim = 5, # Number of simulations for each fold
  n_gibbs_samples = 500, # Gibbs samples for CRPS computation
  seed = 123
)

The cross-validation results are presented below:

Time series cross-validation results (expanding window)
MAE MSE neg.CRPS neg.sCRPS
OU Process (ν=0.5) 2.5949 11.2821 1.8346 1.642
Whittle-Matérn (ν=1.5) 5.4638 51.8733 3.8706 1.997

Understanding the Performance Metrics

MAE (Mean Absolute Error): Average absolute difference between predictions and observations. Interpretable in original units and robust to outliers.

MSE (Mean Squared Error): Average squared difference. Penalizes large errors more heavily than MAE.

Negative CRPS (Continuous Ranked Probability Score): Proper scoring rule evaluating the entire predictive distribution (Gneiting & Raftery, 2007). More negative values are better.

Negative Scaled CRPS: A variant of CRPS designed to be locally scale invariant, meaning observations with different levels of predictive uncertainty contribute comparably to the average score.

Cross-validation Conclusions

The results demonstrate that:

  1. OU achieves best performance: The true data-generating model shows lower errors across all metrics, validating the cross-validation approach.

  2. Smoothness matters: The OU process (less smooth) better matches the true process compared to the smoother Whittle-Matérn specification.

  3. All metrics agree: MAE, MSE, CRPS, and sCRPS provide consistent model rankings.

Key principle: Cross-validation respecting temporal structure successfully identifies models that best capture temporal correlation and smoothness properties.

Spatial Cross-validation with SPDE Models

Spatial SPDE Matérn Models

The ngme2 package supports spatial Matérn models through a non Gaussian extension of the SPDE approach of Lindgren et al. (2011), which represents a Matérn field as the solution of (κ(s)2Δ)α/2X(s)=̇(s), (\kappa(s)^2 - \Delta)^{\alpha/2} X(s) = \dot{\mathcal{M}}(s), with ̇(s)\dot{\mathcal{M}}(s) a Type G Lévy noise input (NIG or GAL). This formulation yields sparse precision matrices and scalable inference. See the full SPDE vignette for details.

Stationary SPDE Matérn model
Assumes a constant range parameter κ(s)=κ\kappa(s) = \kappa. The spatial field then has homogeneous smoothness and correlation range across the domain. This is the simplest and most commonly used specification.

Non stationary SPDE Matérn model
Allows the range parameter to vary spatially via logκ(s)=B(s)θkappa, \log \kappa(s) = B(s)\,\theta_kappa, where B(s)B(s) contains spatial basis functions. This allows the model to capture heterogeneous spatial scales, with shorter ranges where κ(s)\kappa(s) is large and smoother behavior where κ(s)\kappa(s) is small.

Key distinction
The stationary model uses one global range parameter, while the non stationary model estimates spatially varying ranges through basis functions. Both models can be driven by non Gaussian Type G noise, allowing heavy tails and asymmetry in the spatial field.

Generating Simulated Spatial Data

We now turn to spatial data and SPDE models. To demonstrate the importance of proper model specification, we generate data from a non stationary spatial process where the correlation range varies spatially, then compare stationary and non stationary models.

Creating the spatial mesh

First, we construct a triangulation mesh for the SPDE approach.

# Define a rectangular domain
set.seed(42)
domain_boundary <- cbind(
  c(0, 10, 10, 0, 0),
  c(0, 0, 5, 5, 0)
)

# Create mesh with fmesher
mesh <- fm_mesh_2d(
  loc.domain = domain_boundary,
  cutoff = 0.15, # Minimum distance between points
  max.edge = c(0.4, 1.5) # Maximum triangle edge length
)

The mesh properties are summarised below.

Spatial mesh properties
Property Value
Number of vertices 1236
Number of triangles 2419

We can visualise the mesh structure.

plot(mesh, main = "Spatial mesh for SPDE model", asp = 1)
points(domain_boundary, type = "l", col = "red", lwd = 2)

Defining spatially varying parameters

We now create a non-stationary scenario where both the spatial range parameter
κ(s)\kappa(s) and the dispersion parameter σ(s)\sigma(s) of the NIG noise vary across space.
Larger values of κ(s)\kappa(s) correspond to shorter correlation ranges, while σ(s)\sigma(s) controls the
local dispersion of the non-Gaussian noise (although it is not a standard deviation).
The variation is introduced through basis expansions for logκ(s)\log \kappa(s) and logσ(s)\log \sigma(s),
each with its own set of coefficients.

# Extract mesh coordinates
mesh_coords <- mesh$loc[, 1:2]

# ------------------------------------------------------
# Basis functions for spatially varying kappa
# ------------------------------------------------------
B_kappa <- cbind(
  Intercept = 1,
  Longitude = mesh_coords[, 1] / 10,
  Long_squared = (mesh_coords[, 1] / 10)^2
)

# True coefficients for log(kappa)
true_theta_kappa <- c(3, 1, -1)

# Compute kappa(s)
log_kappa_true <- B_kappa %*% true_theta_kappa
kappa_true <- exp(as.vector(log_kappa_true))

# ------------------------------------------------------
# Basis functions for spatially varying sigma
# ------------------------------------------------------
B_sigma <- cbind(
  Intercept = 1,
  Latitude = mesh_coords[, 2] / 5,
  Lat_squared = (mesh_coords[, 2] / 5)^2
)

# True coefficients for log(sigma)
true_theta_sigma <- c(6, 0.5, -0.5)

log_sigma_true <- B_sigma %*% true_theta_sigma
sigma_true <- exp(as.vector(log_sigma_true))

# Data frame for plotting
param_df <- data.frame(
  x = mesh_coords[, 1],
  y = mesh_coords[, 2],
  kappa = kappa_true,
  sigma = sigma_true
)

The spatial variation in κ(s)\kappa(s) and σ(s)\sigma(s) is summarised below.

Spatial variation in kappa and sigma
Parameter Min Mean Max
kappa 14.970 22.952 25.790
sigma 284.182 421.158 457.145

We now visualise the spatial variation in both parameters.

p_kappa <- ggplot(param_df, aes(x = x, y = y, colour = kappa)) +
  geom_point(size = 2, alpha = 0.7) +
  scale_color_viridis(option = "plasma") +
  coord_fixed() +
  labs(
    title = "Spatial variation in kappa",
    subtitle = "Larger kappa → shorter correlation range",
    x = "Longitude", y = "Latitude", colour = "kappa"
  ) +
  theme_minimal()

p_sigma <- ggplot(param_df, aes(x = x, y = y, colour = sigma)) +
  geom_point(size = 2, alpha = 0.7) +
  scale_color_viridis(option = "magma") +
  coord_fixed() +
  labs(
    title = "Spatial variation in sigma",
    subtitle = "Dispersion parameter of NIG noise (not a standard deviation)",
    x = "Longitude", y = "Latitude", colour = "sigma"
  ) +
  theme_minimal()

p_kappa

p_sigma

Simulating observations from the non-stationary model

set.seed(789)

# Observation locations
n_obs <- 600
loc_obs <- cbind(
  runif(n_obs, 0, 10),
  runif(n_obs, 0, 5)
)

# Noise with spatially varying sigma
true_noise <- noise_nig(
  mu = -2.5,
  B_sigma = B_sigma,
  theta_sigma = true_theta_sigma,
  nu = 4
)

# Full non-stationary SPDE model
true_model <- f(
  map = loc_obs,
  model = matern(
    alpha = 2,
    mesh = mesh,
    B_kappa = B_kappa,
    theta_kappa = true_theta_kappa
  ),
  noise = true_noise
)

# Simulate latent field
W_true <- simulate(true_model, seed = 456)[[1]]

# Standard deviation of the simulated field
sd(W_true)
#> [1] 1.950052

# Add measurement noise
sigma_obs <- 0.3
Y_spatial <- W_true + rnorm(n_obs, sd = sigma_obs)

Summary statistics for the simulated field.

Simulated spatial field statistics
Statistic Value
Mean 0.054
Standard deviation 1.950
Range (max minus min) 21.569

We visualise the simulated non stationary spatial field.

sim_df <- data.frame(
  x = loc_obs[, 1],
  y = loc_obs[, 2],
  W = W_true,
  Y = Y_spatial
)

p_field <- ggplot(sim_df, aes(x = x, y = y, colour = W)) +
  geom_point(size = 2.5, alpha = 0.8) +
  scale_color_gradient2(
    low = "#2E86AB",
    mid = "#FFFFFF",
    high = "#A23B72",
    midpoint = 0,
    name = "Field\nvalue"
  ) +
  coord_fixed() +
  labs(
    title = "Simulated non stationary spatial field",
    subtitle = "True model: Matérn SPDE with spatially varying correlation range",
    x = "X coordinate",
    y = "Y coordinate"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 13, face = "bold"),
    plot.subtitle = element_text(size = 11)
  )

p_field

Fitting competing spatial SPDE models

We now fit two competing models to these simulated data.

  • Model A: stationary SPDE Matérn, which assumes a single global range parameter.
  • Model B: non stationary SPDE Matérn, which allows κ\kappa to vary in space using the same basis as in the simulation.

Model A: stationary SPDE Matérn

fit_stationary <- ngme(
  Y ~ 0 + f(
    loc_obs,
    name = "spde_stationary",
    model = matern(
      mesh = mesh
    ),
    noise = noise_nig()
  ),
  data = data.frame(Y = Y_spatial),
  control_opt = control_opt(
    iterations = 2000,
    optimizer = precond_sgd(),
    n_parallel_chain = 4,
    std_lim = 0.01,
    seed = 111
  )
)

Model B: non stationary SPDE Matérn

fit_nonstationary <- ngme(
  Y ~ 0 + f(
    loc_obs,
    name = "spde_nonstationary",
    model = matern(
      mesh = mesh,
      B_kappa = B_kappa, # Basis for kappa variation
      theta_kappa = c(2, 0, 0), # Starting values of kappa
    ),
    noise = noise_nig(
      B_sigma = B_sigma, # Basis for sigma variation
      theta_sigma = c(4, 0, 0)
    ) # Starting values of sigma
  ),
  data = data.frame(Y = Y_spatial),
  control_opt = control_opt(
    iterations = 2000,
    optimizer = precond_sgd(),
    n_parallel_chain = 4,
    std_lim = 0.01,
    seed = 222
  )
)

Both spatial models have been successfully fitted. The stationary model estimates a single global value of κ\kappa, while the non stationary model estimates spatially varying κ\kappa through the coefficients of the basis functions defined above.

Spatial cross-validation

Cross-validation in spatial models must account for the fact that observations are not independent, and that prediction performance depends strongly on the spatial configuration of the training and testing sets. The ngme2 framework supports several cross-validation strategies, including K-fold, leave-percentage-out, and fully custom splits. In this vignette, we illustrate both approaches but only the leave-percentage-out (LPO) method is used when computing the actual cross-validation scores.

K-fold is included purely to show how spatial folds are created and how they look when visualized, which helps build intuition about training/testing partitions in non-lattice spatial data.

K-fold spatial cross-validation (illustration only)

Understanding K-fold spatial cross-validation

In K-fold cross-validation, the observations are randomly divided into K approximately equal-sized subsets:

  • The data are partitioned at random into KK folds.
  • For each fold ii:
    • The testing set corresponds to all observations in fold ii.
    • The training set corresponds to all remaining observations.
  • The procedure is repeated K times, using each fold as the test set exactly once.

Although widely used, K-fold has some limitations for spatial data:

  • Spatially close observations may fall into different folds, violating the idea of “held-out regions”.
  • Test sets are often too small to evaluate large-scale prediction behavior.
  • Training coverage remains high (typically 11/K1 - 1/K), leading to relatively easy prediction tasks.

In this vignette K-fold is presented as it is a common cross-validation option and is available in the cross_validation() function.

Creating K-fold partitions

We follow the same splitting logic used internally by the cross_validation() function.

set.seed(888)
k_folds <- 5
n_data <- nrow(loc_obs)

# Index sequence
idx <- seq_len(n_data)

# Random fold assignment
folds <- cut(sample(idx), breaks = k_folds, label = FALSE)

# Build index lists
test_idx_kfold <- lapply(1:k_folds, function(x) which(folds == x))
train_idx_kfold <- lapply(1:k_folds, function(x) which(folds != x))

Distribution of the number of observations per fold:

Spatial K-fold sizes (illustration only)
Fold Number_of_locations
1 120
2 120
3 120
4 120
5 120

Visualising the spatial K-fold split

The figure below shows how observations are distributed across the K folds.
The folds are not spatially contiguous, which is typical in randomly assigned K-fold for point-referenced spatial data.

df_folds <- data.frame(
  x = loc_obs[, 1],
  y = loc_obs[, 2],
  fold = factor(folds)
)

ggplot(df_folds, aes(x = x, y = y, colour = fold)) +
  geom_point(size = 2.5, alpha = 0.7) +
  scale_color_viridis_d(option = "plasma", name = "Fold") +
  coord_fixed() +
  labs(
    title = "K-fold spatial cross-validation (illustration)",
    subtitle = sprintf("Observations randomly divided into %d folds", k_folds),
    x = "X coordinate", y = "Y coordinate"
  ) +
  theme_minimal()

K-fold is therefore useful as a reference point, but not the scheme used for computing cross-validation scores in this vignette.


Leave-percentage-out (LPO) spatial cross-validation

Main evaluation method: 30 percent for training, 70 percent for testing

Understanding leave-percentage-out cross-validation

Leave-percentage-out (LPO) is often more appropriate for spatial models because it allows the size of training and test sets to be controlled directly.

Key features of LPO:

  • You specify the proportion of data to train on (here 30 percent).
  • The remaining data serve as the test set (here 70 percent).
  • The split is repeated multiple times (here, to reduce computational cost, 5 repetitions).
  • Each repetition creates a new random partition.

This produces a more difficult prediction task, especially when the training fraction is small, and therefore yields a realistic evaluation of extrapolation ability.

Choosing 30 percent training and 70 percent testing

Using only 30 percent of the observations for training ensures a deliberately challenging scenario:

  • Training data form a sparse set of locations across the domain.
  • Testing data cover most of the spatial region (70 percent), including areas far from training points.
  • Stationary models tend to struggle with such extrapolation, whereas non-stationary SPDE models can adapt via spatially varying range or dispersion.

This makes LPO with a small training fraction a strong diagnostic tool.


Creating LPO partitions

set.seed(707)

percent_train <- 0.30
percent_test <- 0.70
n_reps <- 5
n_data <- nrow(loc_obs)

train_idx_lpo <- vector("list", n_reps)
test_idx_lpo <- vector("list", n_reps)

for (i in 1:n_reps) {
  train_idx_lpo[[i]] <- sample(1:n_data, size = floor(percent_train * n_data))
  test_idx_lpo[[i]] <- setdiff(1:n_data, train_idx_lpo[[i]])
}

Example LPO split (30% train, 70% test)

df_lpo <- data.frame(
  x = loc_obs[, 1],
  y = loc_obs[, 2],
  set_type = ifelse(1:n_data %in% train_idx_lpo[[1]],
    "Train (30%)", "Test (70%)"
  )
)

ggplot(df_lpo, aes(x = x, y = y, colour = set_type, size = set_type)) +
  geom_point(alpha = 0.6) +
  scale_color_manual(
    values = c("Train (30%)" = "#2E86AB", "Test (70%)" = "#F18F01"),
    name = "Set type"
  ) +
  scale_size_manual(values = c("Train (30%)" = 2, "Test (70%)" = 3), guide = "none") +
  coord_fixed() +
  labs(
    title = "Example leave-percentage-out partition",
    subtitle = "30% training, 70% testing",
    x = "X coordinate", y = "Y coordinate"
  ) +
  theme_minimal()


Performing LPO cross-validation (main results)

cv_spatial_lpo <- cross_validation(
  list(
    "Stationary SPDE"     = fit_stationary,
    "Non-stationary SPDE" = fit_nonstationary
  ),
  type = "lpo",
  percent = percent_test,
  times = n_reps,
  n_gibbs_samples = 500,
  seed = 2025
)

LPO results: 5 repetitions, 30 percent training

Leave-percentage-out spatial cross-validation (30% train, 70% test, 5 repetitions)
MAE MSE neg.CRPS neg.sCRPS
Stationary SPDE 1.3966 3.9631 1.1184 1.7393
Non-stationary SPDE 1.3961 3.9623 1.1859 2.3799