
Compute the cross-validation for the ngme model Perform cross-validation for ngme model first into sub_groups (a list of target, and train data)
Source:R/validation.R
cross_validation.RdUsage
cross_validation(
ngme,
type = "k-fold",
seed = NULL,
print = TRUE,
N_sim = 5,
n_gibbs_samples = 500,
n_burnin = 100,
k = 5,
percent = 0.2,
times = 10,
metric = NULL,
test_idx = NULL,
train_idx = NULL,
keep_pred = FALSE,
parallel = FALSE,
thining_gap = 1,
cores_layer1 = if (parallel) min(parallel::detectCores(), 2) else 1,
cores_layer2 = if (parallel) min(parallel::detectCores(), 2) else 1,
merge_groups = FALSE,
merged_group_name = NULL
)Arguments
- ngme
a ngme object, or a list of ngme object (if comparing multiple models)
- type
character, in c("k-fold", "loo", "lpo", "custom") k-fold is k-fold cross-validation, provide
kloo is leave-one-out, lpo is leave-percent-out, providepercentfrom 1 to 100 custom is user-defined group, providetargetanddata- seed
random seed
print information during computation
- N_sim
integer, number of simulations (e.g., estimate MAE, MSE, .. N times)
- n_gibbs_samples
number of gibbs samples of latent process, used for computing CRPS, sCRPS
- n_burnin
number of burnin
- k
integer (only for k-fold type)
- percent
how many percent for testing? from 0 to 1 (for lpo type)
- times
how many test cases (only for lpo type)
- metric
Optional function or list of functions (one per model) that maps the group-wise observations/predictions for a single location to the quantity that should be scored. The function receives a list containing at least `y` (named numeric vector of group values) and may optionally use `samples1`/`samples2` (matrices with rows named by group and columns indexing posterior draws). The function must return either a numeric scalar (optionally named) or a list with components `y` (scalar), and optionally `samples1`/`samples2` (numeric vectors matching the posterior draw count) and `label` (character). When `NULL`, the original per-group scores are computed. For example, to compare a linear combination of two fields you can use `metric = function(data) res <- 2 * data$y["A"] + data$y["B"]; names(res) <- "combo"; res `. To simply sum all group values, return `sum(data$y)`.
- test_idx
a list of indices of the data (which data points to be predicted) (only for custom type)
- train_idx
a list of indices of the data (which data points to be used for re-sampling (not re-estimation)) (only for custom type)
- keep_pred
logical, keep test information (pred_1, pred_2) in the return (as attributes), pred_1 and pred_2 are the prediction of the two chains
- parallel
logical, run in parallel mode
- thining_gap
integer, the gap between samples for thinning, if 0, then no thinning, if 1, then keep 50
cores_layer1integer, number of cores for the first layer (over testing samples)
cores_layer2integer, number of cores for the second layer (over computing scores for N_sim simulations)
merge_groupslogical, if TRUE, merge groups as vector components (e.g., for vector-valued wind data with north_wind, east_wind). MAE becomes Euclidean distance, MSE becomes squared Euclidean distance, etc.
merged_group_namecharacter, name for the merged group when merge_groups=TRUE. If NULL, uses "group1_group2" format (default: NULL)
A list containing:
mean.scores - mean of N_sim estimations of 4 criterions: MSE, MAE, CRPS, sCRPS
sd.scores - standard deviation of N_sim estimations of 4 criterions: MSE, MAE, CRPS, sCRPS