
MetricGraph: Random Fields on Metric Graphs
David Bolin, Alexandre B. Simas, and Jonas Wallin
Created: 2022-11-26. Last modified: 2025-04-21.
Source:vignettes/MetricGraph.Rmd
MetricGraph.Rmd
Introduction
There has lately been much interest in statistical modeling of data on compact metric graphs such as street or river networks based on Gaussian random fields.
The R
package MetricGraph
contains
functions for working with data and random fields on compact metric
graphs. The main functionality is contained in the
metric_graph
class, which is used for specifying metric
graphs, adding data to them, visualization, and other basic functions
that are needed for working with data and random fields on metric
graphs. The package also implements three types of Gaussian processes on
metric graphs: The Whittle–Matérn fields introduced by Bolin,
Simas, and Wallin (2024) and Bolin,
Simas, and Wallin (2023), Gaussian processes with isotropic
covariance functions Anderes, Møller, and Rasmussen (2020), and
Gaussian models based on the graph Laplacian Borovitskiy et al. (2021).
Basic statistical tasks such as likelihood evaluation and prediction
is implemented for these three types of models in
MetricGraph
. Further, the package also contains interfaces
to R-INLA
Lindgren and Rue (2015), package available
from http://R-INLA. org/download/
and inlabru
Bachl et al. (2019) that facilitates using
those packages for full Bayesian inference of general Latent Gaussian
Models (LGMs) that includes Whittle–Matérn fields on metric graphs.
The package is available to install via the repository https://github.com/davidbolin/MetricGraph.
The following sections describe the main functionality of the package
and summarizes some of the required theory. Section 2 introduces metric
graphs and the metric_graph
class, Section 3 shows how to
work with random fields on metric graphs, and Section 4 introduces the
inlabru
interface of the package through an application to
real data. For a complete introduction to the functionality of the
package, we refer to the Vignettes available on the package homepage https://davidbolin.github.io/MetricGraph/.
In particular, that contains an introduction to the INLA
interface, the implementation of Whittle–Matérn fields with general
smoothness, and further details and examples for all methods in the
package.
The metric_graph
class
A compact metric graph consists of a set of finitely many vertices and a finite set of edges connecting the vertices. Each edge is defined by a pair of vertices and a finite length . An edge in the graph is a curve parametrized by arc-length, and a location is a position on an edge, and can thus be represented as a touple where .
Basic constructions
A metric graph is represented in the MetricGraph
package
through the class metric_graph
. An object of this class can
be constructed in two ways. The first is to specify the vertex matrix
V
and the edge matrix E
, and it is then
assumed that all edges are straight lines. The second, more flexible,
option is to specify the object from a SpatialLines
object
using the sp
package (Bivand,
Pebesma, and Gomez-Rubio 2013).
To illustrate this, we use the osmdata
package to
download data from OpenStreetMap. In the following code, we extract the
streets on the campus of King Abdullah University of Science and
Technology (KAUST) as a SpatialLines
object:
call <- opq(bbox = c(39.0884, 22.33, 39.115, 22.3056))
call <- add_osm_feature(call, key = "highway",value=c("motorway",
"primary","secondary",
"tertiary",
"residential"))
data <- osmdata_sf(call)
We can now create the metric graph as follows.
graph <- metric_graph$new(data)
graph$plot(vertex_size = 0.5)
Let us also build an interactive plot with the underlying map:
graph$plot(vertex_size = 0.5, type = "mapview")
Observe that when we create this graph from osmdata
, it
also includes the edge data as edge weights:
graph$get_edge_weights()
## # A tibble: 368 × 11
## osm_id name bridge highway layer lit maxspeed maxweight oneway surface
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 38085402 Discov… NA reside… NA yes 40 NA NA NA
## 2 38085409 King A… NA primary NA yes 60 NA yes NA
## 3 39425304 King A… NA primary NA NA 20 NA yes NA
## 4 39425484 Peace … NA reside… NA NA 40 NA NA NA
## 5 39425550 Island… NA reside… NA NA 40 NA NA NA
## 6 39425743 Coral … NA reside… NA yes 40 NA NA NA
## 7 39425802 Hammou… NA reside… NA yes 40 NA NA NA
## 8 39426595 Najel … NA reside… NA yes 40 NA yes NA
## 9 39478848 Discov… NA reside… NA NA 40 NA NA NA
## 10 39478849 Explor… NA reside… NA yes 40 NA NA NA
## # ℹ 358 more rows
## # ℹ 1 more variable: .weights <dbl>
Now, note that when creating the graph there was a warning saying
that the graph is not connected, so let us create a
graph_components
object that contains all connected
components as graphs and then extract the largest connected component to
work with
graphs <- graph_components$new(edges = data)
graph <- graphs$get_largest()
graph$plot(vertex_size = 0)
Here is the interactive plot:
graph$plot(vertex_size = 0, type = "mapview")
The graph
object now contains all important features of
the graph, such as the vertex matrix graph$V
, the number of
vertices graph$nV
, the edge matrix graph$E
,
the number of edges graph$nE
, and the vector of all edge
lengths graph$get_edge_lengths()
given in the unit km.
Thus, we can obtain the range of the edge lengths in the unit m as:
range(graph$get_edge_lengths(unit="m"))
## Units: [m]
## [1] 7.120828 1674.529905
Observe that it also contains the corresponding edge weights:
graph$get_edge_weights()
## # A tibble: 317 × 11
## osm_id name bridge highway layer lit maxspeed maxweight oneway surface
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 38085402 Discov… NA reside… NA yes 40 NA NA NA
## 2 38085409 King A… NA primary NA yes 60 NA yes NA
## 3 39425304 King A… NA primary NA NA 20 NA yes NA
## 4 39425484 Peace … NA reside… NA NA 40 NA NA NA
## 5 39425550 Island… NA reside… NA NA 40 NA NA NA
## 6 39425743 Coral … NA reside… NA yes 40 NA NA NA
## 7 39426595 Najel … NA reside… NA yes 40 NA yes NA
## 8 39478848 Discov… NA reside… NA NA 40 NA NA NA
## 9 39478849 Explor… NA reside… NA yes 40 NA NA NA
## 10 39478852 Transf… NA reside… NA NA 40 NA NA NA
## # ℹ 307 more rows
## # ℹ 1 more variable: .weights <dbl>
We will also remove the vertices of degree 2 by using the
prune_vertices()
method:
graph$prune_vertices()
graph$plot()
Observe that no vertex of degree 2 was pruned. This means that the
edge weights are different. If one wants to prune regardless of losing
information from the edge weights, one can simply set the argument
check_weights
to FALSE
in the
prune_vertices()
method:
graph$prune_vertices(check_weights=FALSE)
graph$plot()
However, note that a warning was given due to incompatible edge weights.
Understanding coordinates on graphs
The locations of the vertices are specified in Euclidean coordinates.
However, when specifying a position on the graph, it is not practical to
work with Euclidean coordinates since not all locations in Euclidean
space are locations on the graph. It is instead better to specify a
location on the graph by the touple
,
where
denotes the number of the edge and
is the location on the edge. The location
can either be specified as the distance from the start of the edge (and
then takes values between 0 and the length of the edge) or as the
normalized distance from the start of the edge (and then takes values
between 0 and 1). The function coordinates
can be used to
convert between coordinates in Euclidean space and locations on the
graph. For example the location at normalized distance 0.2 from the
start of the second edge is:
## [,1] [,2]
## [1,] 39.11693 22.31903
The function can also be used to find the closest location on the graph for a location in Euclidean space:
## [,1] [,2]
## [1,] 2 0.1967297
In this case, the normalized
argument decides whether
the returned value should be given in normalized distance or not.
Adding data to the graph
Given that we have constructed the metric graph, we can now add data to it. For further details on data manipulation on metric graphs, see Data manipulation on metric graphs. As an example, let us sample some locations on edges at random and add data to them in the graph:
n.obs <- 10
data <- data.frame(edge_number = sample(1:graph$nE, n.obs, replace=TRUE),
distance_on_edge = runif(n.obs),
y = rnorm(n.obs))
graph$add_observations(data = data, normalized = TRUE)
## Adding observations...
## Assuming the observations are normalized by the length of the edge.
## The unit for edge lengths is km
## The current tolerance for removing distant observations is (in km): 0.837264952278885
graph$plot(data = "y", vertex_size = 0, type = "mapview")
## Warning: Found more unique colors (100) than unique zcol values (10)!
## Trimming color vector to match number of zcol values.
One should note here that one needs to specify
normalized = TRUE
in the function to specify that the
locations are in normalized distance on the edges. If this command is
not set, the distances are interpreted as not being normalized. The
add_observations()
function accepts multiple types of
inputs. One scenario that can be common in applications is to have the
data as SpatialPoints
objects, and one can then add the
observations as a SpatialPointsDataFrame
. To illustrate
this, let us again sample some locations at random on the graph, then
use the function coordinates
to transform those to
Longitude Latitude coordinates, which we then use to create a
SpatialPointsDataFrame
object which we add to the
graph:
obs.loc <- cbind(sample(1:graph$nE, n.obs, replace=TRUE), runif(n.obs))
obs.lonlat <- graph$coordinates(PtE = obs.loc, normalized = TRUE)
obs <- rnorm(n.obs)
points <- SpatialPointsDataFrame(coords = obs.lonlat,
data = data.frame(y = obs))
graph$add_observations(points)
## Adding observations...
## The unit for edge lengths is km
## The current tolerance for removing distant observations is (in km): 0.837264952278885
## Converting data to PtE
## This step may take long. If this step is taking too long consider pruning the vertices to possibly obtain some speed up.
## $removed
## [1] y .coord_x .coord_y
## <0 rows> (or 0-length row.names)
graph$plot(data = "y", vertex_size = 0, type = "mapview")
If we want to replace the data in the object, we can use
clear_observations()
to remove all current data.
Working with functions on metric graphs
When working with data on metric graphs, one often wants to display
functions on the graph. The best way to visualize functions on the graph
is to evaluate them on a fine mesh over the graph and then use
plot_function
. To illustrate this procedure, let us
construct a mesh on the graph:
graph$build_mesh(h = 100/1000)
graph$plot(mesh=TRUE, type = "mapview")
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
## multiple of vector length (arg 1)
In the command build_mesh
, the argument h
decides the largest spacing between nodes in the mesh. Above we chose
that as 100m which is a bit coarse. So let us reduce the value of
h
to 10m and rebuild the mesh:
graph$build_mesh(h = 10/1000)
Suppose now that we want to display the function
on this graph. We then first evaluate it on the vertices of the mesh and
then use the function plot_function
to display it:
lon <- graph$mesh$V[, 1]
lat <- graph$mesh$V[, 2]
f <- lon - lat
graph$plot_function(X = f, vertex_size = 0, edge_width = 0.5,
type = "mapview")