Skip to contents

Introduction

There has lately been much interest in statistical modeling of data on compact metric graphs such as street or river networks based on Gaussian random fields.

The R package MetricGraph contains functions for working with data and random fields on compact metric graphs. The main functionality is contained in the metric_graph class, which is used for specifying metric graphs, adding data to them, visualization, and other basic functions that are needed for working with data and random fields on metric graphs. The package also implements three types of Gaussian processes on metric graphs: The Whittle–Matérn fields introduced by Bolin, Simas, and Wallin (2024) and Bolin, Simas, and Wallin (2023), Gaussian processes with isotropic covariance functions Anderes, Møller, and Rasmussen (2020), and Gaussian models based on the graph Laplacian Borovitskiy et al. (2021).

Basic statistical tasks such as likelihood evaluation and prediction is implemented for these three types of models in MetricGraph. Further, the package also contains interfaces to R-INLA Lindgren and Rue (2015), package available from http://R-INLA. org/download/ and inlabru Bachl et al. (2019) that facilitates using those packages for full Bayesian inference of general Latent Gaussian Models (LGMs) that includes Whittle–Matérn fields on metric graphs.

The package is available to install via the repository https://github.com/davidbolin/MetricGraph.

The following sections describe the main functionality of the package and summarizes some of the required theory. Section 2 introduces metric graphs and the metric_graph class, Section 3 shows how to work with random fields on metric graphs, and Section 4 introduces the inlabru interface of the package through an application to real data. For a complete introduction to the functionality of the package, we refer to the Vignettes available on the package homepage https://davidbolin.github.io/MetricGraph/. In particular, that contains an introduction to the INLA interface, the implementation of Whittle–Matérn fields with general smoothness, and further details and examples for all methods in the package.

The metric_graph class

A compact metric graph Γ\Gamma consists of a set of finitely many vertices 𝒱={vi}\mathcal{V}=\{v_i\} and a finite set ={ej}\mathcal{E}=\{e_j\} of edges connecting the vertices. Each edge ee is defined by a pair of vertices (vi,vk)(v_i,v_k) and a finite length le(0,)l_e \in (0,\infty). An edge in the graph is a curve parametrized by arc-length, and a location sΓs\in \Gamma is a position on an edge, and can thus be represented as a touple (e,t)(e,t) where t[0,le]t\in[0,l_e].

Basic constructions

A metric graph is represented in the MetricGraph package through the class metric_graph. An object of this class can be constructed in two ways. The first is to specify the vertex matrix V and the edge matrix E, and it is then assumed that all edges are straight lines. The second, more flexible, option is to specify the object from a SpatialLines object using the sp package (Bivand, Pebesma, and Gomez-Rubio 2013).

To illustrate this, we use the osmdata package to download data from OpenStreetMap. In the following code, we extract the streets on the campus of King Abdullah University of Science and Technology (KAUST) as a SpatialLines object:

call <- opq(bbox = c(39.0884, 22.33, 39.115, 22.3056))
call <- add_osm_feature(call, key = "highway",value=c("motorway",
                                                      "primary","secondary",
                                                      "tertiary",
                                                      "residential"))
data <- osmdata_sf(call)

We can now create the metric graph as follows.

graph <- metric_graph$new(data)
graph$plot(vertex_size = 0.5)

Let us also build an interactive plot with the underlying map:

graph$plot(vertex_size = 0.5, type = "mapview")

Observe that when we create this graph from osmdata, it also includes the edge data as edge weights:

graph$get_edge_weights()
## # A tibble: 368 × 11
##    osm_id   name    bridge highway layer lit   maxspeed maxweight oneway surface
##    <chr>    <chr>   <chr>  <chr>   <chr> <chr> <chr>    <chr>     <chr>  <chr>  
##  1 38085402 Discov… NA     reside… NA    yes   40       NA        NA     NA     
##  2 38085409 King A… NA     primary NA    yes   60       NA        yes    NA     
##  3 39425304 King A… NA     primary NA    NA    20       NA        yes    NA     
##  4 39425484 Peace … NA     reside… NA    NA    40       NA        NA     NA     
##  5 39425550 Island… NA     reside… NA    NA    40       NA        NA     NA     
##  6 39425743 Coral … NA     reside… NA    yes   40       NA        NA     NA     
##  7 39425802 Hammou… NA     reside… NA    yes   40       NA        NA     NA     
##  8 39426595 Najel … NA     reside… NA    yes   40       NA        yes    NA     
##  9 39478848 Discov… NA     reside… NA    NA    40       NA        NA     NA     
## 10 39478849 Explor… NA     reside… NA    yes   40       NA        NA     NA     
## # ℹ 358 more rows
## # ℹ 1 more variable: .weights <dbl>

Now, note that when creating the graph there was a warning saying that the graph is not connected, so let us create a graph_components object that contains all connected components as graphs and then extract the largest connected component to work with

graphs <- graph_components$new(edges = data)
graph <- graphs$get_largest()
graph$plot(vertex_size = 0)

Here is the interactive plot:

graph$plot(vertex_size = 0, type = "mapview")

The graph object now contains all important features of the graph, such as the vertex matrix graph$V, the number of vertices graph$nV, the edge matrix graph$E, the number of edges graph$nE, and the vector of all edge lengths graph$get_edge_lengths() given in the unit km. Thus, we can obtain the range of the edge lengths in the unit m as:

range(graph$get_edge_lengths(unit="m")) 
## Units: [m]
## [1]    7.120828 1674.529905

Observe that it also contains the corresponding edge weights:

graph$get_edge_weights()
## # A tibble: 317 × 11
##    osm_id   name    bridge highway layer lit   maxspeed maxweight oneway surface
##    <chr>    <chr>   <chr>  <chr>   <chr> <chr> <chr>    <chr>     <chr>  <chr>  
##  1 38085402 Discov… NA     reside… NA    yes   40       NA        NA     NA     
##  2 38085409 King A… NA     primary NA    yes   60       NA        yes    NA     
##  3 39425304 King A… NA     primary NA    NA    20       NA        yes    NA     
##  4 39425484 Peace … NA     reside… NA    NA    40       NA        NA     NA     
##  5 39425550 Island… NA     reside… NA    NA    40       NA        NA     NA     
##  6 39425743 Coral … NA     reside… NA    yes   40       NA        NA     NA     
##  7 39426595 Najel … NA     reside… NA    yes   40       NA        yes    NA     
##  8 39478848 Discov… NA     reside… NA    NA    40       NA        NA     NA     
##  9 39478849 Explor… NA     reside… NA    yes   40       NA        NA     NA     
## 10 39478852 Transf… NA     reside… NA    NA    40       NA        NA     NA     
## # ℹ 307 more rows
## # ℹ 1 more variable: .weights <dbl>

We will also remove the vertices of degree 2 by using the prune_vertices() method:

graph$prune_vertices()
graph$plot()

Observe that no vertex of degree 2 was pruned. This means that the edge weights are different. If one wants to prune regardless of losing information from the edge weights, one can simply set the argument check_weights to FALSE in the prune_vertices() method:

graph$prune_vertices(check_weights=FALSE)
graph$plot()

However, note that a warning was given due to incompatible edge weights.

Understanding coordinates on graphs

The locations of the vertices are specified in Euclidean coordinates. However, when specifying a position on the graph, it is not practical to work with Euclidean coordinates since not all locations in Euclidean space are locations on the graph. It is instead better to specify a location on the graph by the touple (i,t)(i, t), where ii denotes the number of the edge and tt is the location on the edge. The location tt can either be specified as the distance from the start of the edge (and then takes values between 0 and the length of the edge) or as the normalized distance from the start of the edge (and then takes values between 0 and 1). The function coordinates can be used to convert between coordinates in Euclidean space and locations on the graph. For example the location at normalized distance 0.2 from the start of the second edge is:

graph$coordinates(PtE = matrix(c(2, 0.2), 1,2), normalized = TRUE)
##          [,1]     [,2]
## [1,] 39.11693 22.31903

The function can also be used to find the closest location on the graph for a location in Euclidean space:

graph$coordinates(XY = matrix(c(39.117, 22.319), 1,2))
##      [,1]      [,2]
## [1,]    2 0.1967297

In this case, the normalized argument decides whether the returned value should be given in normalized distance or not.

Adding data to the graph

Given that we have constructed the metric graph, we can now add data to it. For further details on data manipulation on metric graphs, see Data manipulation on metric graphs. As an example, let us sample some locations on edges at random and add data to them in the graph:

n.obs <- 10
data <- data.frame(edge_number = sample(1:graph$nE, n.obs, replace=TRUE),
                   distance_on_edge = runif(n.obs),
                   y = rnorm(n.obs))
graph$add_observations(data = data, normalized = TRUE)
## Adding observations...
## Assuming the observations are normalized by the length of the edge.
## The unit for edge lengths is km
## The current tolerance for removing distant observations is (in km): 0.837264952278885
graph$plot(data = "y", vertex_size = 0, type = "mapview")
## Warning: Found more unique colors (100) than unique zcol values (10)! 
## Trimming color vector to match number of zcol values.

One should note here that one needs to specify normalized = TRUE in the function to specify that the locations are in normalized distance on the edges. If this command is not set, the distances are interpreted as not being normalized. The add_observations() function accepts multiple types of inputs. One scenario that can be common in applications is to have the data as SpatialPoints objects, and one can then add the observations as a SpatialPointsDataFrame. To illustrate this, let us again sample some locations at random on the graph, then use the function coordinates to transform those to Longitude Latitude coordinates, which we then use to create a SpatialPointsDataFrame object which we add to the graph:

obs.loc <- cbind(sample(1:graph$nE, n.obs, replace=TRUE), runif(n.obs))
obs.lonlat <- graph$coordinates(PtE = obs.loc, normalized = TRUE)
obs <- rnorm(n.obs)
points <- SpatialPointsDataFrame(coords = obs.lonlat,
                                 data = data.frame(y = obs))
graph$add_observations(points)
## Adding observations...
## The unit for edge lengths is km
## The current tolerance for removing distant observations is (in km): 0.837264952278885
## Converting data to PtE
## This step may take long. If this step is taking too long consider pruning the vertices to possibly obtain some speed up.
## $removed
## [1] y        .coord_x .coord_y
## <0 rows> (or 0-length row.names)
graph$plot(data = "y", vertex_size = 0, type = "mapview")

If we want to replace the data in the object, we can use clear_observations() to remove all current data.

Working with functions on metric graphs

When working with data on metric graphs, one often wants to display functions on the graph. The best way to visualize functions on the graph is to evaluate them on a fine mesh over the graph and then use plot_function. To illustrate this procedure, let us construct a mesh on the graph:

graph$build_mesh(h = 100/1000)
graph$plot(mesh=TRUE, type = "mapview")
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
## multiple of vector length (arg 1)

In the command build_mesh, the argument h decides the largest spacing between nodes in the mesh. Above we chose that as 100m which is a bit coarse. So let us reduce the value of h to 10m and rebuild the mesh:

  graph$build_mesh(h = 10/1000)

Suppose now that we want to display the function f(s)=Longitude(s)Latitude(s)f(s) = \text{Longitude}(s) - \text{Latitude}(s) on this graph. We then first evaluate it on the vertices of the mesh and then use the function plot_function to display it:

lon <- graph$mesh$V[, 1]
lat <- graph$mesh$V[, 2]
f <- lon - lat
graph$plot_function(X = f, vertex_size = 0, edge_width = 0.5,
                type = "mapview")