cross_validation(
  ngme,
  type = "k-fold",
  seed = NULL,
  print = FALSE,
  N_sim = 5,
  n_gibbs_samples = 500,
  n_burnin = 100,
  k = 5,
  percent = 0.2,
  times = 10,
  transform = identity,
  test_idx = NULL,
  train_idx = NULL,
  keep_pred = FALSE,
  parallel = FALSE,
  thining_gap = 1,
  cores_layer1 = if (parallel) min(parallel::detectCores(), 2) else 1,
  cores_layer2 = if (parallel) min(parallel::detectCores(), 2) else 1,
  merge_groups = FALSE,
  merged_group_name = NULL
)

Arguments

ngme

a ngme object, or a list of ngme object (if comparing multiple models)

type

character, in c("k-fold", "loo", "lpo", "custom") k-fold is k-fold cross-validation, provide k loo is leave-one-out, lpo is leave-percent-out, provide percent from 1 to 100 custom is user-defined group, provide target and data

seed

random seed

print

print information during computation

N_sim

integer, number of simulations (e.g., estimate MAE, MSE, .. N times)

n_gibbs_samples

number of gibbs samples of latent process, used for computing CRPS, sCRPS

n_burnin

number of burnin

k

integer (only for k-fold type)

percent

how many percent for testing? from 0 to 1 (for lpo type)

times

how many test cases (only for lpo type)

transform

a function or a list of functions (length equal to number of models) to map predictions and observations to the comparison scale (e.g., identity for original scale, exp if the model is on log scale). e.g., the MAE will be computed as |transform(Y) - transform(Y_pred)|

test_idx

a list of indices of the data (which data points to be predicted) (only for custom type)

train_idx

a list of indices of the data (which data points to be used for re-sampling (not re-estimation)) (only for custom type)

keep_pred

logical, keep test information (pred_1, pred_2) in the return (as attributes), pred_1 and pred_2 are the prediction of the two chains

parallel

logical, run in parallel mode

thining_gap

integer, the gap between samples for thinning, if 0, then no thinning, if 1, then keep 50

cores_layer1integer, number of cores for the first layer (over testing samples)

cores_layer2integer, number of cores for the second layer (over computing scores for N_sim simulations)

merge_groupslogical, if TRUE, merge groups as vector components (e.g., for vector-valued wind data with north_wind, east_wind). MAE becomes Euclidean distance, MSE becomes squared Euclidean distance, etc.

merged_group_namecharacter, name for the merged group when merge_groups=TRUE. If NULL, uses "group1_group2" format (default: NULL)

A list containing:

  • mean.scores - mean of N_sim estimations of 4 criterions: MSE, MAE, CRPS, sCRPS

  • sd.scores - standard deviation of N_sim estimations of 4 criterions: MSE, MAE, CRPS, sCRPS

Compute the cross-validation for the ngme model Perform cross-validation for ngme model first into sub_groups (a list of target, and train data)