
Compute the cross-validation for the ngme model Perform cross-validation for ngme model first into sub_groups (a list of target, and train data)
Source:R/validation.R
cross_validation.RdCompute the cross-validation for the ngme model Perform cross-validation for ngme model first into sub_groups (a list of target, and train data)
Usage
cross_validation(
ngme,
type = "k-fold",
seed = NULL,
print = TRUE,
N_sim = 5,
n_gibbs_samples = 500,
n_burnin = 100,
k = 5,
percent = 0.2,
times = 10,
metric = NULL,
test_idx = NULL,
train_idx = NULL,
keep_pred = FALSE,
parallel = FALSE,
thining_gap = 1,
cores_layer1 = if (parallel) min(parallel::detectCores(), 2) else 1,
cores_layer2 = if (parallel) min(parallel::detectCores(), 2) else 1,
merge_groups = FALSE,
merged_group_name = NULL,
data = NULL,
chain_combine = c("param_mean", "predictive_average")
)Arguments
- ngme
a ngme object, or a list of ngme object (if comparing multiple models)
- type
character, in c("k-fold", "loo", "lpo", "custom") k-fold is k-fold cross-validation, provide
kloo is leave-one-out, lpo is leave-percent-out, providepercentfrom 1 to 100 custom is user-defined group, providetargetanddata- seed
random seed
print information during computation
- N_sim
integer, number of simulations (e.g., estimate MAE, MSE, .. N times)
- n_gibbs_samples
number of gibbs samples of latent process, used for computing CRPS, sCRPS
- n_burnin
number of burnin
- k
integer (only for k-fold type)
- percent
how many percent for testing? from 0 to 1 (for lpo type)
- times
how many test cases (only for lpo type)
- metric
Optional function or list of functions (one per model) that maps the group-wise observations/predictions for a single location to the quantity that should be scored. The function receives a list containing at least
y(a named numeric vector of group values) and may optionally usesamples1andsamples2(matrices with rows named by group and columns indexing posterior draws). The function must return either a numeric scalar (optionally named), or a list with componentyand optional componentssamples1,samples2, andlabel. WhenNULL, the original per-group scores are computed. Example:metric = function(data) 2 * data$y["A"] + data$y["B"]. To sum all groups, returnsum(data$y).- test_idx
a list of indices of the data (which data points to be predicted) (only for custom type)
- train_idx
a list of indices of the data (which data points to be used for re-sampling (not re-estimation)) (only for custom type)
- keep_pred
logical, keep test information (pred_1, pred_2) in the return (as attributes), pred_1 and pred_2 are the prediction of the two chains
- parallel
logical, run in parallel mode
- thining_gap
integer, the gap between samples for thinning, if 0, then no thinning, if 1, then keep 50% of the samples for CRPS, sCRPS, etc.
- cores_layer1
integer, number of cores for the first layer (over testing samples)
- cores_layer2
integer, number of cores for the second layer (over computing scores for N_sim simulations)
- merge_groups
logical, if TRUE, merge groups as vector components (e.g., for vector-valued wind data with north_wind, east_wind). MAE becomes Euclidean distance, MSE becomes squared Euclidean distance, etc.
- merged_group_name
character, name for the merged group when merge_groups=TRUE. If NULL, uses "group1_group2" format (default: NULL)
- data
optional data.frame used to replace the original fitting data before running CV. If `NULL`, the data stored in `ngme` is used. If provided, the model is rebuilt on `data` while reusing fitted parameters from `ngme`.
- chain_combine
how to combine multiple optimization chains: `"param_mean"` uses the fitted object directly (default), while `"predictive_average"` computes predictions from each optimization chain and averages at the predictive level.