Skip to contents

Introduction

In this vignette we will introduce how to fit Whittle–Matérn fields with general smoothness based on finite element and rational approximations. The theory for this approach is provided in Bolin et al. (2023) and Bolin, Simas, and Xiong (2023). For the implementation, we make use of the rSPDE package for the rational approximations.

These models are thus implemented using finite element approximations. Such approximations are not needed for integer smoothness parameters, and for the details about the exact models we refer to the vignettes

For details on the construction of metric graphs, see Working with metric graphs

For further details on data manipulation on metric graphs, see Data manipulation on metric graphs

Constructing the graph and the mesh

We begin by loading the rSPDE and MetricGraph packages:

As an example, we consider the following metric graph

  edge1 <- rbind(c(0,0),c(1,0))
  edge2 <- rbind(c(0,0),c(0,1))
  edge3 <- rbind(c(0,1),c(-1,1))
  theta <- seq(from=pi,to=3*pi/2,length.out = 20)
  edge4 <- cbind(sin(theta),1+ cos(theta))
  edges = list(edge1, edge2, edge3, edge4)
  graph <- metric_graph$new(edges = edges)
  graph$plot()

To construct a FEM approximation of a Whittle–Matérn field with general smoothness, we must first construct a mesh on the graph.

  graph$build_mesh(h = 0.1)
  graph$plot(mesh=TRUE)

In the command build_mesh, the argument h decides the largest spacing between nodes in the mesh. As can be seen in the plot, the mesh is very coarse, so let’s reduce the value of h and rebuild the mesh:

graph$build_mesh(h = 0.01)

We are now ready to specify the model (κ2Δ)α/2τu=𝒲 (\kappa^2 - \Delta)^{\alpha/2} \tau u = \mathcal{W} for the Whittle–Matérn field uu. For this, we use the matern.operators function from the rSPDE package:

  sigma <- 1.3
  range <- 0.15
  nu <- 0.8 

  rspde.order <- 2
  op <- matern.operators(nu = nu, range = range, sigma = sigma, 
                         parameterization = "matern",
                         m = rspde.order, graph = graph)                     

As can be seen in the code, we specify κ\kappa via the practical correlation range 8ν/κ\sqrt{8\nu}/\kappa. Also, the model is not parametrized by τ,α\tau, \alpha but instead by σ,ν\sigma, \nu. Here, sigma denotes the standard deviation of the field and nu is the smoothness parameter, which is related to α\alpha via the relation α=ν+1/2\alpha = \nu + 1/2. The object op contains the matrices needed for evaluating the distribution of the stochastic weights in the FEM approximation.

Let us simulate the field uu at the mesh locations and plot the result:

u <- simulate(op)
graph$plot_function(X = u, type = "plotly")

If we want to evaluate u(s)u(s) at some locations s1,,sns_1,\ldots, s_n, we need to multiply the weights with the FEM basis functions φi(s)\varphi_i(s) evaluated at the locations. For this, we can construct the observation matrix A\boldsymbol{\mathrm{A}}, with elements Aij=φj(si)A_{ij} = \varphi_j(s_i), which links the FEM basis functions to the locations. This can be done by the function fem_basis in the metric graph object. To illustrate this, let us simulate some observation locations on the graph and construct the matrix:

obs.per.edge <- 100
obs.loc <- NULL
for(i in 1:graph$nE) {
  obs.loc <- rbind(obs.loc,
                   cbind(rep(i,obs.per.edge), runif(obs.per.edge)))
}
n.obs <- obs.per.edge*graph$nE
A <- graph$fem_basis(obs.loc)

In the code, we generate 100100 observation locations per edge in the graph, drawn at random. It can be noted that we assume that the observation locations are given in the format (e,d)(e, d) where ee denotes the edge of the observation and dd is the position on the edge, i.e., the relative distance from the first vertex of the edge.

To compute the precision matrix from the covariance-based rational approximation one can use the precision() method on object returned by the matern.operators() function:

  Q <- precision(op)

As an illustration of the model, let us compute the covariance function between the process at s=(2,0.1)s=(2,0.1), that is, the point at edge 2 and distance on edge 0.1, and all the other mesh points. To this end, we can use the helper function cov_function_mesh that is contained in the op object:

  c_cov <- op$cov_function_mesh(matrix(c(2,0.1),1,2))
  graph$plot_function(c_cov, type = "plotly")

Using the model for inference

There is built-in support for computing log-likelihood functions and performing kriging prediction in the rSPDE package which we can use for the graph model. To illustrate this, we use the simulation to create some noisy observations of the process. We generate the observations as Yi=1+2xi13xi2+u(si)+εiY_i = 1 + 2x_{i1} - 3 x_{i2} + u(s_i) + \varepsilon_i, where εiN(0,σe2)\varepsilon_i \sim N(0,\sigma_e^2) is Gaussian measurement noise, x1x_1 and x2x_2 are covariates generated the relative positions of the observations on the graph.

    sigma.e <- 0.1

    x1 <- obs.loc[,1]
    x2 <- obs.loc[,2]

    Y <- 1 + 2*x1 - 3*x2 + as.vector(A %*% u + sigma.e * rnorm(n.obs))

Let us now fit the model. To this end we will use the graph_lme() function (that, for the finite element models, acts as a wrapper for the rspde_lme() function from the rSPDE package). To this end, let us now assemble the data.frame() with the observations, the observation locations and the covariates:

df_data <- data.frame(y = Y, edge_number = obs.loc[,1],
                        distance_on_edge = obs.loc[,2],
                        x1 = x1, x2 = x2)

Let us now add the data to the graph object and plot it:

graph$add_observations(data = df_data, normalized = TRUE)
## Adding observations...
## list()
graph$plot(data = "y")

We can now fit the model. To this end, we use the graph_lme() function and set the model to 'WM’.

fit <- graph_lme(y ~ x1 + x2, graph = graph, model = "WM")

Let us obtain a summary of the model:

summary(fit)
## 
## Latent model - Whittle-Matern
## 
## Call:
## graph_lme(formula = y ~ x1 + x2, graph = graph, model = "WM")
## 
## Fixed effects:
##             Estimate Std.error z-value Pr(>|z|)    
## (Intercept)   1.1342    0.6768   1.676   0.0938 .  
## x1            1.9773    0.1879  10.524  < 2e-16 ***
## x2           -2.9377    0.7049  -4.167 3.08e-05 ***
## 
## Random effects:
##        Estimate Std.error z-value
## alpha  1.305517  0.018531  70.452
## tau    0.044937  0.004135  10.866
## kappa 17.839895  2.473788   7.212
## 
## Random effects (Matern parameterization):
##       Estimate Std.error z-value
## nu     0.80552   0.01853  43.469
## sigma  1.31867   0.12520  10.532
## range  0.14230   0.01913   7.438
## 
## Measurement error:
##          Estimate Std.error z-value
## std. dev 0.098714  0.006472   15.25
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-Likelihood:  -127.6056 
## Number of function calls by 'optim' = 501
## Optimization method used in 'optim' = Nelder-Mead
## 
## Time used to:     fit the model =  8.02456 secs

We can also obtain additional information by using the function glance():

glance(fit)
## # A tibble: 1 × 9
##    nobs  sigma logLik   AIC   BIC deviance df.residual model               alpha
##   <int>  <dbl>  <dbl> <dbl> <dbl>    <dbl>       <dbl> <chr>               <dbl>
## 1   400 0.0987  -128.  269.  297.     255.         393 Covariance-Based M…  1.31

Let us compare the values of the parameters of the latent model with the true ones:

print(data.frame(sigma = c(sigma, fit$matern_coeff$random_effects[2]), 
                   range = c(range, fit$matern_coeff$random_effects[3]),
                   nu = c(nu, fit$matern_coeff$random_effects[1]),
                   row.names = c("Truth", "Estimates")))
##              sigma     range        nu
## Truth     1.300000 0.1500000 0.8000000
## Estimates 1.318671 0.1422951 0.8055167

Kriging

Given that we have estimated the parameters, let us compute the kriging predictor of the field given the observations at the mesh nodes.

We will perform kriging with the predict() method. To this end, we need to provide a data.frame containing the prediction locations, as well as the values of the covariates at the prediction locations.

  df_pred <- data.frame(edge_number = graph$mesh$VtE[,1],
                        distance_on_edge = graph$mesh$VtE[,2],
                        x1 = graph$mesh$VtE[,1],
                        x2 = graph$mesh$VtE[,2])

  u.krig <- predict(fit, newdata = df_pred, normalized = TRUE)

The estimate is shown in the following figure

  graph$plot_function(as.vector(u.krig$mean))  

We can also use the augment() function to easily plot the predictions. Let us a build a 3d plot now and add the observed values on top of the predictions:

p <- augment(fit, newdata = df_pred, normalized = TRUE) %>% 
          graph$plot_function(data = ".fitted", type = "plotly")

graph$plot(data = "y", p = p, type = "plotly")          

Fitting a model with replicates

Let us now illustrate how to simulate a data set with replicates and then fit a model to such data. To simulate a latent model with replicates, all we do is set the nsim argument to the number of replicates.

  n.rep <- 30
  u.rep <- simulate(op, nsim = n.rep)

Now, let us generate the observed values YY:

  sigma.e <- 0.3
  Y.rep <- A %*% u.rep + sigma.e * matrix(rnorm(n.obs * n.rep), ncol = n.rep)

Note that YY is a matrix with 20 columns, each column containing one replicate. We need to turn y into a vector and create an auxiliary vector repl indexing the replicates of y:

y_vec <- as.vector(Y.rep)
repl <- rep(1:n.rep, each = n.obs)                       

df_data_repl <- data.frame(y = y_vec,
                              edge_number = rep(obs.loc[,1], n.rep),
                              distance_on_edge = rep(obs.loc[,2], n.rep), 
                              repl = repl)

Let us clear the previous observations and add the new data to the graph:

graph$add_observations(data = df_data_repl, normalized = TRUE, 
                            group = "repl", clear_obs = TRUE)
## Adding observations...
## list()

We can now fit the model in the same way as before by using the rspde_lme() function. Note that we can optimize in parallel by setting parallel to TRUE. If we do not specify which replicate to consider, in the which_repl argument, all replicates will be considered.

fit_repl <- graph_lme(y ~ -1, graph = graph, model = "WM", parallel = TRUE)

Observe that we have received a warning saying that the Hessian was not positive-definite, which ended up creating NaNs for the standard errors. Indeed, let us see a summary of the fit:

summary(fit_repl)
## 
## Latent model - Whittle-Matern
## 
## Call:
## graph_lme(formula = y ~ -1, graph = graph, model = "WM", parallel = TRUE)
## 
## No fixed effects.
## 
## Random effects:
##       Estimate Std.error z-value
## alpha  1.28724       NaN     NaN
## tau    0.05301       NaN     NaN
## kappa 15.57972   0.39112   39.83
## 
## Random effects (Matern parameterization):
##       Estimate Std.error z-value
## nu    0.787237       NaN     NaN
## sigma 1.320496  0.025176   52.45
## range 0.161079  0.004835   33.32
## 
## Measurement error:
##          Estimate Std.error z-value
## std. dev 0.302088  0.002948   102.5
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-Likelihood:  -9741.837 
## Number of function calls by 'optim' = 59
## Optimization method used in 'optim' = L-BFGS-B
## 
## Time used to:     fit the model =  34.30693 secs 
##   set up the parallelization = 2.54156 secs

Let us, then, follow the suggestion from the warning and refit the model setting improve_hessian to TRUE. This will obtain a more precise estimate of the Hessian, which can possibly fix this issue:

fit_repl <- graph_lme(y ~ -1, graph = graph, model = "WM", 
                      parallel = TRUE, improve_hessian = TRUE)

We see that we did not receive any warning now, and the Std. errors were computed accordingly:

summary(fit_repl)
## 
## Latent model - Whittle-Matern
## 
## Call:
## graph_lme(formula = y ~ -1, graph = graph, model = "WM", parallel = TRUE, 
##     improve_hessian = TRUE)
## 
## No fixed effects.
## 
## Random effects:
##        Estimate Std.error z-value
## alpha  1.287237  0.011143  115.52
## tau    0.053011  0.002775   19.11
## kappa 15.579717  0.565564   27.55
## 
## Random effects (Matern parameterization):
##       Estimate Std.error z-value
## nu    0.787237  0.011143   70.65
## sigma 1.320496  0.025176   52.45
## range 0.161079  0.004835   33.32
## 
## Measurement error:
##          Estimate Std.error z-value
## std. dev 0.302088  0.002993   100.9
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-Likelihood:  -9741.837 
## Number of function calls by 'optim' = 59
## Optimization method used in 'optim' = L-BFGS-B
## 
## Time used to:     fit the model =  30.27751 secs 
##   compute the Hessian = 9.96476 secs 
##   set up the parallelization = 2.54477 secs

Let us also take a glance of the fit:

glance(fit_repl)
## # A tibble: 1 × 9
##    nobs sigma logLik    AIC    BIC deviance df.residual model              alpha
##   <int> <dbl>  <dbl>  <dbl>  <dbl>    <dbl>       <dbl> <chr>              <dbl>
## 1 12000 0.302 -9742. 19492. 19521.   19484.       11996 Covariance-Based …  1.29

Let us compare the values of the parameters of the latent model with the true ones:

print(data.frame(sigma = c(sigma, fit_repl$matern_coeff$random_effects[2]), 
                   range = c(range, fit_repl$matern_coeff$random_effects[3]),
                   nu = c(nu, fit_repl$matern_coeff$random_effects[1]),
                   row.names = c("Truth", "Estimates")))
##              sigma     range        nu
## Truth     1.300000 0.1500000 0.8000000
## Estimates 1.320496 0.1610788 0.7872375

Let us do kriging. We will use the same prediction locations as in the previous example. Let us get prediction for replicate 10, then add the original observations on top of them:

p <- augment(fit_repl, which_repl = 10, newdata = df_pred, normalized = TRUE) %>% 
          graph$plot_function(data = ".fitted", type = "plotly")

graph$plot(data = "y", group = 10, type = "plotly", p = p)

Using the R-INLA implementation

We also have an R-INLA implementation of the rational SPDE approach for metric graphs.

We begin by defining the model by using the rspde.metric_graph() function. This function contains the same arguments as the function rspde.matern(). We refer the reader to the R-INLA implementation of the rational SPDE approach vignette for further details.

We begin by clearing the previous observations and adding the observations (for the case without replicates) to the graph:

graph$clear_observations()
graph$add_observations(data = df_data, normalized = TRUE)
## Adding observations...
## list()

Let us create the model object:

  library(INLA)
  rspde_model <- rspde.metric_graph(graph)

By default, the order of the rational approximation is 2.

We can now create the auxiliary quantities that will be needed with the graph_data_rspde() function:

  data_rspde <- graph_data_rspde(rspde_model, name = "field")

The remaining is standard: we create the formula object, the stack object, and then fit the model by using the inla() function. So, first we create the formula object:

  f.s <- y ~ -1 + Intercept + x1 + x2 + f(field, model = rspde_model)

Now we create the inla.stack object. To such an end, observe that data_rspde contains the dataset as the data component, the index as the index component and the so-called A matrix as the basis component. We will now create the stack using these components:

  stk.dat <- inla.stack(
    data = data_rspde[["data"]]["y"], A = list(data_rspde[["basis"]],1), tag = "est",
    effects =
      list(c(
        data_rspde[["index"]],
        list(Intercept = 1)), list(x1 = data_rspde[["data"]]["x1"] ,
                                      x2 = data_rspde[["data"]]["x2"])
      )
    )

Finally, we can fit the model:

  rspde_fit <- inla(f.s, data = inla.stack.data(stk.dat),
    control.inla = list(int.strategy = "eb"),
    control.predictor = list(A = inla.stack.A(stk.dat), compute = TRUE),
    num.threads = "1:1"
  )

We can use the same functions as the rspde fitted models in inla. For instance, we can see the results in the original scale by creating the result object:

  result_fit <- rspde.result(rspde_fit, "field", rspde_model)
  summary(result_fit)
##             mean        sd 0.025quant 0.5quant 0.975quant     mode
## std.dev 1.416640 0.1536490  1.1436500 1.406330   1.746420 1.383690
## range   0.152527 0.0340364  0.0969751 0.148646   0.230109 0.141099
## nu      0.866600 0.1188480  0.6394470 0.864836   1.103940 0.861391

Let us compare with the true values:

  result_df <- data.frame(
    parameter = c("std.dev", "range", "nu"),
    true = c(sigma, range, nu),
    mean = c(
      result_fit$summary.std.dev$mean,
      result_fit$summary.range$mean,
      result_fit$summary.nu$mean
    ),
    mode = c(
      result_fit$summary.std.dev$mode,
      result_fit$summary.range$mode,
      result_fit$summary.nu$mode
    )
  )
  print(result_df)
##   parameter true      mean      mode
## 1   std.dev 1.30 1.4166379 1.3836894
## 2     range 0.15 0.1525273 0.1410994
## 3        nu 0.80 0.8666003 0.8613914

We can also plot the posterior marginal densities with the help of the gg_df() function:

  posterior_df_fit <- gg_df(result_fit)

  library(ggplot2)

  ggplot(posterior_df_fit) + geom_line(aes(x = x, y = y)) + 
  facet_wrap(~parameter, scales = "free") + labs(y = "Density")

Kriging with the R-INLA implementation

We will do kriging on the mesh locations:

  pred_loc <- graph$mesh$VtE

Let us now add the observations for prediction:

graph$add_observations(data = data.frame(y=rep(NA,nrow(pred_loc)), 
                                x1 = graph$mesh$VtE[,1],
                                x2 = graph$mesh$VtE[,2],
                                edge_number = pred_loc[,1], 
                                distance_on_edge = pred_loc[,2]), 
                                normalized = TRUE)
## Adding observations...
## list()

Let us now create a new model and, then, compute the auxiliary components at the prediction locations. To this end, we set the argument only_pred to TRUE, in which it will return the data.frame containing the NA data.

  rspde_model_prd <- rspde.metric_graph(graph) 
  data_rspde_prd <- graph_data_rspde(rspde_model_prd, only_pred = TRUE)

Let us build the prediction stack using the components of data_rspde_prd and gather it with the estimation stack.

  ef.prd <- 
    list(c(data_rspde_prd[["index"]], list(Intercept = 1)), 
          list(x1 = data_rspde_prd[["data"]][["x1"]],
                x2 = data_rspde_prd[["data"]][["x2"]]))
  stk.prd <- inla.stack(
    data = data.frame(y = data_rspde_prd[["data"]][["y"]]),
    A = list(data_rspde_prd[["basis"]],1), tag = "prd",
    effects = ef.prd
  )
  stk.all <- inla.stack(stk.dat, stk.prd)

Let us obtain the predictions:

rspde_fitprd <- inla(f.s,
  data = inla.stack.data(stk.all),
  control.predictor = list(
    A = inla.stack.A(stk.all),
    compute = TRUE, link = 1
  ),
  control.compute = list(
    return.marginals = FALSE,
    return.marginals.predictor = FALSE
  ),
  control.inla = list(int.strategy = "eb"),
  num.threads = "1:1"
)

Let us now extract the indices of the predicted nodes and store the means:

id.prd <- inla.stack.index(stk.all, "prd")$data
m.prd <- rspde_fitprd$summary.fitted.values$mean[id.prd]

Finally, let us plot the predicted values. To this end we will use the plot_function() graph method.

  graph$plot_function(m.prd, type = "plotly")  

Using R-INLA implementation to fit models with replicates

Let us begin by cloning the graph and clearing the observations on the cloned graph:

graph_rep <- graph$clone()
graph_rep$clear_observations()

We will now add the data with replicates to the graph:

graph_rep$add_observations(data = data.frame(y=as.vector(Y.rep), 
                          edge_number = rep(obs.loc[,1], n.rep), 
                          distance_on_edge = rep(obs.loc[,2], n.rep),
                          repl = rep(1:n.rep, each = n.obs)), 
                          group = "repl",
                          normalized = TRUE)
## Adding observations...
## list()

Let us create a new rspde model object:

rspde_model_rep <- rspde.metric_graph(graph_rep)

To fit the model with replicates we need to create the auxiliary quantities with the graph_data_rspde() function, where we set the repl argument in the function graph_data_spde to .all since we want to use all replicates:

data_rspde_rep <- graph_data_rspde(rspde_model_rep, 
                      name = "field", repl = ".all",
                      repl_col = "repl")

Let us now create the corresponding inla.stack object:

st.dat.rep <- inla.stack(
  data = data_rspde_rep[["data"]],
  A = data_rspde_rep[["basis"]],
  effects = data_rspde_rep[["index"]]
)

Observe that we need the response variable y to be a vector. We can now create the formula object, remembering that since we gave the name argument field, when creating the index, we need to pass field.repl to the formula:

f.rep <-
  y ~ -1 + f(field,
    model = rspde_model_rep,
    replicate = field.repl
  )

We can, finally, fit the model:

rspde_fit_rep <-
  inla(f.rep,
    data = inla.stack.data(st.dat.rep),
    family = "gaussian",
    control.predictor =
      list(A = inla.stack.A(st.dat.rep)),
    num.threads = "1:1"
  )

We can obtain the estimates in the original scale with the rspde.result() function:

  result_fit_rep <- rspde.result(rspde_fit_rep, "field", rspde_model_rep)
  summary(result_fit_rep)
##             mean         sd 0.025quant 0.5quant 0.975quant    mode
## std.dev 1.321010 0.02422350   1.274020 1.320810   1.369170 1.32045
## range   0.148397 0.00584935   0.137153 0.148317   0.160125 0.14821
## nu      0.847571 0.03557730   0.778960 0.847092   0.918645 0.84572

Let us compare with the true values of the parameters:

  result_rep_df <- data.frame(
    parameter = c("std.dev", "range", "nu"),
    true = c(sigma, range, nu),
    mean = c(
      result_fit_rep$summary.std.dev$mean,
      result_fit_rep$summary.range$mean,
      result_fit_rep$summary.nu$mean
    ),
    mode = c(
      result_fit_rep$summary.std.dev$mode,
      result_fit_rep$summary.range$mode,
      result_fit_rep$summary.nu$mode
    )
  )
  print(result_rep_df)
##   parameter true      mean      mode
## 1   std.dev 1.30 1.3210090 1.3204468
## 2     range 0.15 0.1483970 0.1482098
## 3        nu 0.80 0.8475711 0.8457203

We can also plot the posterior marginal densities with the help of the gg_df() function:

  posterior_df_fit_rep <- gg_df(result_fit_rep)

  ggplot(posterior_df_fit_rep) + geom_line(aes(x = x, y = y)) + 
  facet_wrap(~parameter, scales = "free") + labs(y = "Density")

Using inlabru implementation

The inlabru package allows us to fit models and do kriging in a straighforward manner, without having to handle A matrices, indices nor inla.stack objects. Therefore, we suggest the reader to use this implementation when using our implementation to fit real data.

Let us clear the graph, since it contains NA observations we used for prediction, add the observations again, and create a new rSPDE model object:

graph$clear_observations()
graph$add_observations(data = df_data, 
                          normalized = TRUE)
## Adding observations...
## list()
rspde_model <- rspde.metric_graph(graph)

Let us now load the inlabru package and create the component (which is inlabru’s formula-like object). Let us begin by building the auxiliary data to be used with the graph_data_rspde() function, where we pass the name of the location variable in the above formula as the loc_name argument, which in this case is "loc":

data_rspde_bru <- graph_data_rspde(rspde_model, bru = TRUE)

Now, we create the component to be used in inlabru, in which we pass the index element from the data_rspde_bru object as index locations:

    library(inlabru)
    cmp <-
    y ~ -1 + Intercept(1) + x1 + x2 + field(
                          cbind(.edge_number, .distance_on_edge), 
                          model = rspde_model
                          )                   

Now, we can directly fit the model, by using the data element of data_rspde_bru:

  rspde_bru_fit <-
    bru(cmp,
        data=data_rspde_bru[["data"]],
        options = list(num.threads = "1:1")
    )

Let us now obtain the estimates of the parameters in the original scale by using the rspde.result() function:

  result_bru_fit <- rspde.result(rspde_bru_fit, "field", rspde_model)
  summary(result_bru_fit)
##             mean       sd 0.025quant 0.5quant 0.975quant     mode
## std.dev 1.418920 0.154009   1.152140 1.405810   1.755610 1.374550
## range   0.150596 0.032822   0.097651 0.146578   0.226045 0.138624
## nu      0.868781 0.129118   0.621019 0.867422   1.125160 0.865624

Let us compare with the true values of the parameters:

  result_bru_df <- data.frame(
    parameter = c("std.dev", "range", "nu"),
    true = c(sigma, range, nu),
    mean = c(
      result_bru_fit$summary.std.dev$mean,
      result_bru_fit$summary.range$mean,
      result_bru_fit$summary.nu$mean
    ),
    mode = c(
      result_bru_fit$summary.std.dev$mode,
      result_bru_fit$summary.range$mode,
      result_bru_fit$summary.nu$mode
    )
  )
  print(result_bru_df)
##   parameter true      mean      mode
## 1   std.dev 1.30 1.4189179 1.3745526
## 2     range 0.15 0.1505963 0.1386245
## 3        nu 0.80 0.8687812 0.8656242

We can also plot the posterior marginal densities with the help of the gg_df() function:

  posterior_df_bru_fit <- gg_df(result_bru_fit)

  ggplot(posterior_df_bru_fit) + geom_line(aes(x = x, y = y)) + 
  facet_wrap(~parameter, scales = "free") + labs(y = "Density")

Kriging with the inlabru implementation

It is very easy to do kriging with the inlabru implementation. We simply need to provide the prediction locations to the predict() method.

In this example we will use the mesh locations. To this end we will use the get_mesh_locations() method. We also set bru=TRUE to obtain a data frame suitable to be used with inlabru. In this case, the mesh locations will be returned as a data.frame with the location columns .edge_number and .distance_on_edge. We will, then, add the covariates x1 and x2 to the data frame:

  prd_loc <- graph$get_mesh_locations(bru = TRUE)
  prd_loc[["x1"]] <- prd_loc[,1]
  prd_loc[["x2"]] <- prd_loc[,2]  

Now, we can simply provide these locations to the predict method along with the fitted object rspde_bru_fit:

  y_pred <- predict(rspde_bru_fit, newdata=prd_loc, 
                        ~Intercept + x1 + x2 + field)

Let us now prepare the predictions so we can plot them easily by using the process_rspde_predictions() function:

y_pred <- process_rspde_predictions(y_pred, graph = graph, PtE = prd_loc)

Finally, let us plot the predicted values. To this end we will use the plot() method on y_pred:

  plot(y_pred) 

We can also create the 3d plot, together with the true data:

p <- graph$plot(data = "y", type = "plotly")
plot(y_pred, type = "plotly", p = p)

Using inlabru to fit models with replicates

We can also use our inlabru implementation to fit models with replicates. We will consider the same data that was generated above, where the number of replicates is 30.

For this implementation we will use the rspde_model_rep object.

We can now create the component, passing the vector with the indices of the replicates as the replicate argument. To obtain the auxiliary data, we will pass repl argument we use the function graph_data_rspde(), where we set it to .all, since we want all replicates. Further, we also set the argument bru to TRUE.

data_rspde_rep <- graph_data_rspde(rspde_model_rep, repl = ".all", 
                                    bru = TRUE, repl_col = "repl")

We can now define the bru component formula, passing the repl as the replicate argument:

  cmp_rep <-
    y ~ -1 + field(cbind(.edge_number, .distance_on_edge), 
                              model = rspde_model_rep,
                              replicate = repl)

Now, we are ready to fit the model:

  rspde_bru_fit_rep <-
    bru(cmp_rep,
        data=data_rspde_rep[["data"]],
        options=list(
        family = "gaussian",
        num.threads = "1:1")
    )

We can obtain the estimates in the original scale with the rspde.result() function:

  result_bru_fit_rep <- rspde.result(rspde_bru_fit_rep, "field", rspde_model_rep)
  summary(result_bru_fit_rep)
##             mean         sd 0.025quant 0.5quant 0.975quant    mode
## std.dev 1.321010 0.02422350   1.274020 1.320810   1.369170 1.32045
## range   0.148397 0.00584935   0.137153 0.148317   0.160125 0.14821
## nu      0.847571 0.03557730   0.778960 0.847092   0.918645 0.84572

Let us compare with the true values of the parameters:

  result_bru_rep_df <- data.frame(
    parameter = c("std.dev", "range", "nu"),
    true = c(sigma, range, nu),
    mean = c(
      result_bru_fit_rep$summary.std.dev$mean,
      result_bru_fit_rep$summary.range$mean,
      result_bru_fit_rep$summary.nu$mean
    ),
    mode = c(
      result_bru_fit_rep$summary.std.dev$mode,
      result_bru_fit_rep$summary.range$mode,
      result_bru_fit_rep$summary.nu$mode
    )
  )
  print(result_bru_rep_df)
##   parameter true      mean      mode
## 1   std.dev 1.30 1.3210090 1.3204468
## 2     range 0.15 0.1483970 0.1482098
## 3        nu 0.80 0.8475711 0.8457203

We can also plot the posterior marginal densities with the help of the gg_df() function:

  posterior_df_bru_fit_rep <- gg_df(result_bru_fit_rep)

  ggplot(posterior_df_bru_fit_rep) + geom_line(aes(x = x, y = y)) + 
  facet_wrap(~parameter, scales = "free") + labs(y = "Density")

Let us now do prediction for observations of replicate 10. We start by building the data list with the prediction locations:

  data_prd_repl <- graph$get_mesh_locations(bru = TRUE)
  data_prd_repl[["repl"]] <- rep(10, nrow(data_prd_repl))

Let us now obtain predictions for this replicate:

  y_pred <- predict(rspde_bru_fit_rep, 
                      newdata=data_prd_repl, 
                      ~field_eval(cbind(.edge_number, .distance_on_edge), 
                                    replicate = repl))

Let us now process the predictions:

  y_pred <- process_rspde_predictions(y_pred, graph = graph, PtE = data_prd_repl)

We can now plot the predictions along with the observed values for replicate 10:

p <- plot(y_pred, type = "plotly")
graph_rep$plot(data = "y", group = 10, type = "plotly", p = p)

An example with a non-stationary model

Our goal now is to show how one can fit model with non-stationary σ\sigma (std. deviation) and non-stationary ρ\rho (a range parameter). One can also use the parameterization in terms of non-stationary SPDE parameters κ\kappa and τ\tau.

We follow the same structure as INLA. However, INLA only allows one to specify B.tau and B.kappa matrices, and, in INLA, if one wants to parameterize in terms of range and standard deviation one needs to do it manually. Here we provide the option to directly provide the matrices B.sigma and B.range.

The usage of the matrices B.tau and B.kappa are identical to the corresponding ones in inla.spde2.matern() function. The matrices B.sigma and B.range work in the same way, but they parameterize the stardard deviation and range, respectively.

The columns of the B matrices correspond to the same parameter. The first column does not have any parameter to be estimated, it is a constant column.

So, for instance, if one wants to share a parameter with both sigma and range (or with both tau and kappa), one simply let the corresponding column to be nonzero on both B.sigma and B.range (or on B.tau and B.kappa).

Creating the graph and adding data

For this example we will consider the pems data contained in the MetricGraph package. The data consists of traffic speed observations on highways in the city of San Jose, California. The variable y contains the traffic speeds.

 pems_graph <- metric_graph$new(edges = pems$edges)
 pems_graph$add_observations(data = pems$data)
## list()
 pems_graph$prune_vertices()
 pems_graph$build_mesh(h=0.1)

The summary of this graph:

summary(pems_graph)
## A metric graph object with:
## 
## Vertices:
##   Total: 347 
##   Degree 1: 11;  Degree 2: 16;  Degree 3: 315;  Degree 4: 5; 
##   With incompatible directions:  17 
## 
## Edges: 
##   Total: 504 
##   Lengths: 
##       Min: 0.01040218  ; Max: 7.677232  ; Total: 470.7559 
##   Weights: 
##       Columns: .weights 
##   That are circles:  0 
## 
## Graph units: 
##   Vertices unit:  degree  ; Lengths unit:  km 
## 
## Longitude and Latitude coordinates:  TRUE
##   Which spatial package:  sp 
##   CRS:  +proj=longlat +datum=WGS84 +no_defs
## 
## Some characteristics of the graph:
##   Connected: TRUE
##   Has loops: FALSE
##   Has multiple edges: TRUE
##   Is a tree: FALSE
##   Distance consistent: FALSE
##   Has Euclidean edges: FALSE
## 
## Computed quantities inside the graph: 
##   Laplacian:  FALSE  ; Geodesic distances:  TRUE 
##   Resistance distances:  FALSE  ; Finite element matrices:  FALSE 
## 
## Mesh: 
##   Max h_e:  0.09998277  ; Min n_e:  0 
## 
## Data: 
##   Columns:  y 
##   Groups:  .group 
## 
## Tolerances: 
##   vertex-vertex:  0.001 
##   vertex-edge:  0.001 
##   edge-edge:  0

Observe that it is a non-Euclidean graph.

Let us create as non-stationary covariates, the position on the edge, which will capture if the traffic speed was taken close to the intersections. We will make this function symmetric around 0.5 by subtracting 0.5 for points larger than 0.5. That is, the covariate is zero close to intersections.

cov_pos <- 2 * ifelse(pems_graph$mesh$VtE[,2] > 0.5, 
                    1-pems_graph$mesh$VtE[,2], 
                    pems_graph$mesh$VtE[,2])

We will now build the non-stationary matrices to be used:

 B.sigma = cbind(0, 1, 0, cov_pos, 0)
 B.range = cbind(0, 0, 1,  0, cov_pos)

Let us also obtain the same covariate for the observations:

cov_obs <- pems$data[[".distance_on_edge"]]
cov_obs <- 2 * ifelse(cov_obs > 0.5, 
                      1 - cov_obs,
                      cov_obs)

Let add this covariate to the data:

pems_graph$add_observations(data = pems_graph$mutate(cov_obs = cov_obs),
                            clear_obs = TRUE)
## Adding observations...
## The unit for edge lengths is km
## The current tolerance for removing distant observations is (in km): 3.83861599015656
## list()

Fitting the model with graph_lme

We are now in position to fit this model using the graph_lme() function. We will also add cov_obs as a covariate for the model.

fit <- graph_lme(y ~ cov_obs, graph = pems_graph, model = list(type = "WhittleMatern", 
                    B.sigma = B.sigma, B.range = B.range, fem = TRUE))

Let us now obtain a summary of the fitted model:

summary(fit)
## 
## Latent model - Generalized Whittle-Matern
## 
## Call:
## graph_lme(formula = y ~ cov_obs, graph = pems_graph, model = list(type = "WhittleMatern", 
##     B.sigma = B.sigma, B.range = B.range, fem = TRUE))
## 
## Fixed effects:
##             Estimate Std.error z-value Pr(>|z|)    
## (Intercept)   47.173     2.771  17.026  < 2e-16 ***
## cov_obs        6.974     1.829   3.812 0.000138 ***
## 
## Random effects:
##         Estimate Std.error z-value
## alpha    2.04453   0.02968  68.877
## Theta 1  3.71475   0.53999   6.879
## Theta 2  2.82127   0.53088   5.314
## Theta 3 -1.75469   0.69441  -2.527
## Theta 4 -1.48582   0.68906  -2.156
## 
## Measurement error:
##          Estimate Std.error z-value
## std. dev  7.30787   0.05426   134.7
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Log-Likelihood:  -1209.851 
## Number of function calls by 'optim' = 501
## Optimization method used in 'optim' = Nelder-Mead
## 
## Time used to:     fit the model =  35.45902 secs

Let us plot the range parameter along the mesh, so we can see how it is varying:

est_range <- exp(B.range[,-1]%*%fit$coeff$random_effects[2:5])
pems_graph$plot_function(X = est_range, vertex_size = 0, 
                    type = "mapview", mapview_caption = "Range")
## Warning: Found less unique colors (100) than unique zcol values (6830)! 
## Interpolating color vector to match number of zcol values.
## Warning: Found less unique colors (100) than unique zcol values (6830)! 
## Interpolating color vector to match number of zcol values.

Similarly, we have for sigma:

est_sigma <- exp(B.sigma[,-1]%*%fit$coeff$random_effects[2:5])
pems_graph$plot_function(X = est_sigma, vertex_size = 0, 
                    type = "mapview", mapview_caption = "Sigma")
## Warning: Found less unique colors (100) than unique zcol values (6836)! 
## Interpolating color vector to match number of zcol values.
## Warning: Found less unique colors (100) than unique zcol values (6836)! 
## Interpolating color vector to match number of zcol values.

Our goal now is to plot the estimated marginal standard deviation of this model. To this end, we start by creating the non-stationary Matérn operator using the rSPDE package:

rspde_object_ns <- rSPDE::spde.matern.operators(graph = pems_graph,
                                                parameterization = "matern",
                                                B.sigma = B.sigma,
                                                B.range = B.range,
                                                theta = fit$coeff$random_effects[2:5],
                                                nu = fit$coeff$random_effects[1] - 0.5)

Now, we compute the estimated marginal standard deviation:

est_cov_matrix <- rspde_object_ns$covariance_mesh()
est_std_dev <- sqrt(Matrix::diag(est_cov_matrix))

We can now plot:

pems_graph$plot_function(X = est_std_dev, vertex_size = 0, 
          type = "mapview", mapview_caption = "Std. dev")
## Warning: Found less unique colors (100) than unique zcol values (11618)! 
## Interpolating color vector to match number of zcol values.
## Warning: Found less unique colors (100) than unique zcol values (11618)! 
## Interpolating color vector to match number of zcol values.

Fitting the inlabru rSPDE model

Let us then fit the same model using inlabru now. We start by defing the rSPDE model with the rspde.metric_graph() function:

rspde_model_nonstat <- rspde.metric_graph(pems_graph,
  B.sigma = B.sigma,
  B.range = B.range,
  parameterization = "matern") 

Let us now create the data.frame() and the vector with the replicates indexes:

 data_rspde_bru_ns <- graph_data_rspde(rspde_model_nonstat, bru = TRUE)

Let us create the component and fit.

cmp_nonstat <-
  y ~ -1 + Intercept(1) + cov_obs + field(
    cbind(.edge_number, .distance_on_edge),
    model = rspde_model_nonstat
  )


rspde_fit_nonstat <-
  bru(cmp_nonstat,
    data = data_rspde_bru_ns[["data"]],
    family = "gaussian",
    options = list(num.threads = "1:1")
  )

We can get the summary:

summary(rspde_fit_nonstat)
## inlabru version: 2.12.0
## INLA version: 24.12.11
## Components:
## Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L), NULL
## cov_obs: main = linear(cov_obs), group = exchangeable(1L), replicate = iid(1L), NULL
## field: main = cgeneric(cbind(.edge_number, .distance_on_edge)), group = exchangeable(1L), replicate = iid(1L), NULL
## Likelihoods:
##   Family: 'gaussian'
##     Tag: ''
##     Data class: 'metric_graph_data', 'data.frame'
##     Response class: 'numeric'
##     Predictor: y ~ .
##     Used components: effects[Intercept, cov_obs, field], latent[]
## Time used:
##     Pre = 0.245, Running = 66.7, Post = 0.498, Total = 67.5 
## Fixed effects:
##             mean     sd 0.025quant 0.5quant 0.975quant   mode kld
## Intercept 25.584 25.298    -28.659   27.447     70.310 31.679   0
## cov_obs    1.390  1.857     -2.253    1.389      5.035  1.389   0
## 
## Random effects:
##   Name     Model
##     field CGeneric
## 
## Model hyperparameters:
##                                          mean    sd 0.025quant 0.5quant
## Precision for the Gaussian observations 0.021 0.002      0.017    0.020
## Theta1 for field                        3.715 0.151      3.507    3.699
## Theta2 for field                        4.736 0.299      4.318    4.706
## Theta3 for field                        2.045 0.563      1.285    1.901
## Theta4 for field                        1.537 0.192      1.078    1.562
## Theta5 for field                        0.223 0.535     -0.479    0.099
##                                         0.975quant   mode
## Precision for the Gaussian observations      0.025  0.020
## Theta1 for field                             4.075  3.577
## Theta2 for field                             5.448  4.468
## Theta3 for field                             3.381  1.424
## Theta4 for field                             1.787  1.723
## Theta5 for field                             1.497 -0.332
## 
## Deviance Information Criterion (DIC) ...............: 2325.11
## Deviance Information Criterion (DIC, saturated) ....: 457.24
## Effective number of parameters .....................: 136.74
## 
## Watanabe-Akaike information criterion (WAIC) ...: 2309.84
## Effective number of parameters .................: 96.58
## 
## Marginal log-Likelihood:  -1258.15 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

We can obtain outputs with respect to parameters in the original scale by using the function rspde.result():

result_fit_nonstat <- rspde.result(rspde_fit_nonstat, "field", rspde_model_nonstat)
summary(result_fit_nonstat)
##                  mean       sd 0.025quant 0.5quant 0.975quant     mode
## Theta1.matern 3.71488 0.150985   3.506590  3.69875    4.07507 3.577090
## Theta2.matern 4.73571 0.299128   4.317940  4.70556    5.44849 4.468000
## Theta3.matern 2.04487 0.562966   1.285440  1.90073    3.38126 1.424390
## Theta4.matern 1.53725 0.191925   1.078050  1.56236    1.78684 1.723270
## nu            1.08987 0.243512   0.763014  1.04731    1.63126 0.782159

We can also plot the posterior densities. To this end we will use the gg_df() function, which creates ggplot2 user-friendly data frames:

posterior_df_fit <- gg_df(result_fit_nonstat)

ggplot(posterior_df_fit) + geom_line(aes(x = x, y = y)) + 
facet_wrap(~parameter, scales = "free") + labs(y = "Density")

References

Bolin, David, Mihály Kovács, Vivek Kumar, and Alexandre B. Simas. 2023. “Regularity and Numerical Approximation of Fractional Elliptic Differential Equations on Compact Metric Graphs.” Mathematics of Computation.
Bolin, David, Alexandre B. Simas, and Zhen Xiong. 2023. “Covariance-Based Rational Approximations of Fractional SPDEs for Computationally Efficient Bayesian Inference.” Journal of Computational and Graphical Statistics.