\( \newcommand{\bm}[1]{\boldsymbol{#1}} \)

Chapter 7 Implementing the Preferred Model for the ACA

7.1 Exploring the Queensland Shapefile

Shapefiles are useful in spatial modelling for two reasons: they contain the spatial information necessary to determine the spatial dependencies (e.g. latitude and longitude of centroids, distances between centroids, adjacency (neighbours)), and they provide the fundamental layer in spatial visualisations. It is very easy to create thematic and other spatial maps using shapefiles in conjunction with the R package ggplot2.

There are plenty of shapefiles freely available for download from the Internet. The Queensland shapefile was created by modifying an Australia-wide shapefile obtained from the Australian Bureau of Statistics ( http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.001July%202011?OpenDocument). The areas in this shapefile represent statistical areas level 2 (SA2s). Australia has 2196 SA2s (excluding areas with undefined physical locations).

This shapefile can be downloaded as a zip file to a local directory on your computer, extracted, and then read into R. Alternatively, this process can be streamlined using the following R code.

# Download, extract, and read shapefile 
URL <- "https://www.abs.gov.au/ausstats/subscriber.nsf/log?openagent&1270055001_sa2_2011_aust_shape.zip&1270.0.55.001&Data%20Cubes&7130A5514535C5FCCA257801000D3FBD&0&July%202011&23.12.2010&Latest.zip"
temp <- tempfile()
download.file(URL, temp)
file.rename(temp, paste0(temp, ".zip"))
temp <- paste0(temp, ".zip")
unzip(temp, exdir = tempdir())
map <- paste0(tempdir(), "/SA2_2011_AUST.shp") %>%
    readOGR()
unlink(temp)
rm(temp)

To create the Queensland shapefile, we will make the following modifications

  • Exclude areas outside of Queensland.
  • Reduce the complexity of the polygon detail. This reduces plot rendering times and file size if you choose to save the modified shapefile.
  • Remove unnecessary spatial data from shapefile (SA3 and SA4 codes and names, square metres area, etc.)
  • Add average estimated residential population for the period 2010-2014. This will be used to remove areas with fewer than 5 residents on average per year during 2010-2014.
# Exclude areas outside of Queensland
map <- map[which(map$STE_NAME11 == "Queensland"),]       
N <- length(map)    # Now 526 areas

# Simplify polygons (this may take up to several minutes)
map <- ms_simplify(map, keep = 0.03)

# Remove unecessary spatial data from shapefile
names(map)
map$SA2_MAIN11 <- NULL
map$SA3_CODE11 <- NULL
map$SA3_NAME11 <- NULL
map$SA4_CODE11 <- NULL
map$SA4_NAME11 <- NULL
map$ALBERS_SQM <- NULL

# Read in estimated residential population data
ERP <- read.csv("Downloaded Data/Aus ERPs by SA2.csv")

# Match records to shapefile and append to shapefile
ERP <- ERP[match(map$SA2_5DIG11, ERP$SA2.Code),]
map$pop <- ERP$pop

If you want to save the resulting shapefile, use the following code:

writeOGR(map, dsn = fp.wd, layer = "Shapefile_QLD_SA2", driver = "ESRI Shapefile")

where fp.wd is the filepath of your working directory or other local directory.

For the ACA, SA2s which had no residential population or fewer than 5 residents on average per year during the study period 2010-2014 were excluded from the model (i.e. no estimates were provided for these areas). For illustration, we will also exclude these SA2s (the ones in Queensland) in this example scenario too. Note that we still want to keep these areas in the shapefile for plotting purposes - we only exclude them from the model.

# Save ID of areas with no or small population
SA2.no.pop <- which(map$pop < 5)
map$SA2_5DIG11[SA2.no.pop]  # Actual SA2 codes
## [1] "31102" "31150" "31162" "31202" "31279" "31301" "31315" "31360"

Now let's explore this (modified) shapefile. To create good-looking maps in R, the following maps are defined:

  • map.border: a map which contains only the outer border. If map is plotted as it is, the borders between each SA2 obscure the visual patterns, especially for smaller areas. The border can be omitted, but this obscures coastal regions especially if the colours on the map are similar to the background colour. By creating a border-only copy of the map, it is possible to render a map with only the outside border by plotting two layers: one with the border, followed by one with the coloured areas (no borders).
  • map.1, map.2, etc.: map insets. These smaller map subsets help visualise the spatial patterns in regions of small, tightly-clustered SA2s, which is typical around major cities.
#----------------------------------------------------------------------------------
# Define subset maps and borders
#----------------------------------------------------------------------------------

# Create border map from full map
map.border <- unionSpatialPolygons(map, IDs = rep(1, N))
map.border <- fortify(map.border)

# Get map projection for insets
map.proj <- proj4string(map)

# User-defined function for creating inset windows
get.inset <- function(x1, x2, y1, y2, map, map.proj){
    Inset.window <- as(raster::extent(x1, x2, y1, y2), "SpatialPolygons")
    proj4string(Inset.window) <- map.proj
    map.inset <- gIntersection(map, Inset.window, byid = TRUE, drop_lower_td = TRUE, 
        id = sapply(map@polygons, function(x) x@ID))
    return(map.inset)
}

# City inset limits (determined by trial and error)
lims <- data.frame(
    xmin = c(152.6, 150.1, 146.5, 145.5),
    xmax = c(153.6, 151, 147, 146),
    ymin = -c(28, 23.8, 19.6, 17.3),
    ymax = -c(27, 22.9, 19, 16.7)
)

# Inset maps for major cities
map.1 <- get.inset(152.6, 153.6, -28, -27, map, map.proj)       # Brisbane
map.2 <- get.inset(150.1, 151, -23.8, -22.9, map, map.proj)     # Rockhampton
map.3 <- get.inset(146.5, 147, -19.6, -19, map, map.proj)       # Townsville
map.4 <- get.inset(145.5, 146, -17.3, -16.7, map, map.proj)     # Cairns

# Borders for insets
map.border.1 <- unionSpatialPolygons(map.1, IDs = rep(1, length(map.1)))
map.border.2 <- unionSpatialPolygons(map.2, IDs = rep(1, length(map.2)))
map.border.3 <- unionSpatialPolygons(map.3, IDs = rep(1, length(map.3)))
map.border.4 <- unionSpatialPolygons(map.4, IDs = rep(1, length(map.4)))

# Fortify
map.df <- fortify(map)
map.1 <- fortify(map.1)
map.2 <- fortify(map.2)
map.3 <- fortify(map.3)
map.4 <- fortify(map.4)
map.border.1 <- fortify(map.border.1)
map.border.2 <- fortify(map.border.2)
map.border.3 <- fortify(map.border.3)
map.border.4 <- fortify(map.border.4)

# Make shapefile dataframe IDs numeric
map.df$id <- as.numeric(map.df$id) + 1  # Need to add 1 so range is 1:N
map.1$id <- as.numeric(map.1$id) + 1
map.2$id <- as.numeric(map.2$id) + 1
map.3$id <- as.numeric(map.3$id) + 1
map.4$id <- as.numeric(map.4$id) + 1

# Quick plot
ggplot(data = map.df, aes(x = long, y = lat, group = group)) +
    geom_polygon(colour = "black", aes(fill = factor(id))) +
    coord_map() +
    scale_fill_discrete(guide = FALSE)

This is just a plot of the 526 SA2s coloured by their ID value, \(1,\ldots,N\). (Note how the border obscures the patterns in the southeast.) Now let's map some spatial information encoded within the shapefile to each of these areas. We will re-use the code for computing the distances between centroids (see the section on the Geostatistical Model, Section 5.2.4). Since we have the average residential population for each area also encoded in the shapefile, we will plot this too.

#----------------------------------------------------------------------------------
# Compute distances between centroids
#----------------------------------------------------------------------------------

# Determine the longitude and lattitude of centroids
Centroids <- matrix(NA, N, 2)
for(i in 1:N){
    bbox <- bbox(map@polygons[[i]]) # Bounding box of area
    Centroids[i,] <- c(bbox[1,1] + (bbox[1,2] - bbox[1,1]) / 2,
        bbox[2,1] + (bbox[2,2] - bbox[2,1]) / 2)
}
Centroids <- data.frame(Centroids)
names(Centroids) <- c("long", "lat")

# Compute the distance matrix (Great Circle distance; WGS84 ellipsoid)
d <- matrix(NA, N, N)
for(i in 1:N){
    d[i,] <- spDistsN1(as.matrix(Centroids), 
        as.matrix(Centroids[i,]), longlat = TRUE)
}

#----------------------------------------------------------------------------------
# Plotting distances from select area
#----------------------------------------------------------------------------------

# Use area "175" (Roma) for illustration
dist <- d[175,]

# For clarity, truncate distances > 1000Km so these areas have the same colour
dist[which(dist > 1000)] <- 1000

# Define fill colours as hexidecimal values (change as desired)
Fill.colours <- c("white", "#00ffff", "#00bfff", "#0040ff", "#0000cc", "#000099")
Fill.values <- c(0, 100, 250, 500, 750, 1000)
Fill.values.r <- rescale(Fill.values, from = range(dist), na.rm = TRUE)
Breaks.fill <- Fill.values
Breaks.fill[2] <- NA

# Create dataframe for ggplot
Append <- data.frame(id = 1:N, dist = dist)

# Merge values with shapefile dataframe
dat <- inner_join(map.df, Append, by = "id")

# Set up basic components of the plot
gg.base <-  ggplot(data = NULL, aes(x = long, y = lat, group = group)) +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    theme_void() +
    guides(fill = guide_colourbar(barheight = 20)) +
    coord_map() # Keeps x-y ratio consistent

# Add visible layers for borders and fill colours
gg.dist <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = dist), color = NA) +
    scale_fill_gradientn("Distance", 
        colours = Fill.colours, 
        values = Fill.values.r ,
        breaks = Breaks.fill, 
        labels = Breaks.fill,
        limits = range(Fill.values))

#----------------------------------------------------------------------------------
# Plotting residential population
#----------------------------------------------------------------------------------

# For clarity, truncate pop > 25000 so these areas have the same colour
pop <- map$pop
pop[which(pop > 25000)] <- 25000

# Define fill colours as hexidecimal values (change as desired)
Fill.colours <- c("white", "white", "#e6ffe6", "#80ff9f", "#00cc33", "#009926", "#00661a")
Fill.values <- c(0, 2, 5, 10, 15, 20, 25) * 1000
Fill.values.r <- rescale(Fill.values, from = range(pop), na.rm = TRUE)
Breaks.fill <- Fill.values
Breaks.fill[2] <- NA

# Create dataframe for ggplot
Append <- data.frame(id = 1:N, pop = pop)

# Merge values with shapefile dataframe
dat <- inner_join(map.df, Append, by = "id")

# Add visible layers for borders and fill colours
gg.pop <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = pop), color = NA) +
    scale_fill_gradientn("Population", 
        colours = Fill.colours, 
        values = Fill.values.r ,
        breaks = Breaks.fill, 
        labels = Breaks.fill,
        limits = range(Fill.values)) +
    annotate("rect",
        xmin = lims$xmin - 0.01, xmax = lims$xmax + 0.01,
        ymin = lims$ymin - 0.01, ymax = lims$ymax + 0.01,
        colour = "black", fill = NA)    # This creates a box around map inset locations

# Render plots
grid.arrange(
    gg.dist,
    gg.pop,
    nrow = 1
)

We can also look at the city insets. Here are the city insets for Population.

# Create inset maps
gg.inset <- vector("list", 4)
gg.base.inset <- gg.base + guides(fill = FALSE, alpha = FALSE) +
    scale_fill_gradientn("Population", 
        colours = Fill.colours, 
        values = Fill.values.r ,
        breaks = Breaks.fill, 
        labels = Breaks.fill,
        limits = range(Append$pop))

# Brisbane
dat <- inner_join(map.1, Append, by = "id")
gg.inset[[1]] <- gg.base.inset + 
    geom_polygon(data = map.border.1, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = pop), color = NA) +
    ggtitle("Brisbane")

# Rockhampton
dat <- inner_join(map.2, Append, by = "id")
gg.inset[[2]] <- gg.base.inset + 
    geom_polygon(data = map.border.2, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = pop), color = NA) +
    ggtitle("Rockhampton")

# Townsville
dat <- inner_join(map.3, Append, by = "id")
gg.inset[[3]] <- gg.base.inset + 
    geom_polygon(data = map.border.3, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = pop), color = NA) +
    ggtitle("Townsville")

# Cairns
dat <- inner_join(map.4, Append, by = "id")
gg.inset[[4]] <- gg.base.inset + 
    geom_polygon(data = map.border.4, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = pop), color = NA) +
    ggtitle("Cairns")

# Render the plots
grid.arrange(
    gg.inset[[1]], 
    gg.inset[[2]],
    arrangeGrob(gg.inset[[4]], gg.inset[[3]], ncol = 1),
    ncol = 3, widths = c(10, 9, 5)
)

7.2 Exploring Simulated Incidence Data

Now let's import some simulated incidence data and explore this.

# Import data
data <- read.csv("Downloaded Data/7.2 QLD Sim Incidence.csv")
head(data)
##    ID area_code              area_name observed    expected   covariate raw.sir
## 1  14     31014 Brisbane Port - Lytton        0 0.029574176 -1.54999165       0
## 2  99     31099     Enoggera Reservoir        0 0.093771779 -1.14331023       0
## 3 102     31102         Mount Coot-tha        0 0.000000000 -0.03879053       0
## 4 150     31150             Lamb Range        0 0.000000000 -0.80058154       0
## 5 162     31162           Wooroonooran        0 0.008655857 -1.26765497       0
## 6 197     31197            Callemondah        0 0.155084096 -2.04983563       0
# Re-order records to match shapefile IDs
data <- data[match(map$SA2_5DIG11, data$area_code), ]

# Observed (raw) SIR ( = observed / expected)
SIR.raw <- data$raw.sir
# OR
SIR.raw <- data$observed/data$expected
SIR.raw[which(is.nan(SIR.raw))] <- 0

# Truncate raw SIR values
SIR.raw[which(SIR.raw > 4)] <- 4
SIR.raw[which(SIR.raw < 0.25)] <- 0.25

# Truncate observed values
obs <- data$observed
obs[which(obs > 80)] <- 80

# Dataframe of the variables we need to append to the shapefile
Append <- data.frame(
    id = data$ID,
    obs = obs,
    SIR = SIR.raw
)

# Fill colours for observed and SIR variables
Fill.colours.obs <- c(grey(seq(1, 0, -0.25)), grey(0))
Fill.colours.SIR <- c("#2C7BB6", "#2C7BB6", "#ABD9E9", "#FFFFBF", "#FDAE61", "#D7191C", 
    "#D7191C")

# Values corresponding to the fill colours
Fill.values.obs <- c(seq(0, 80, 20), max(obs))  # Solid black for obs > 80
Fill.values.SIR <- c(log2(0.25), seq(log2(1/1.5), log2(1.5), length = 5), log2(4))
# Note: we use logged values for the SIR so that the colour gradient is linear

# Rescaled fill values
Fill.values.obs.r <- rescale(Fill.values.obs, from = range(Append$obs))
Fill.values.SIR.r <- rescale(Fill.values.SIR, from = range(log2(Append$SIR)))

# Merge values with shapefile dataframe
dat <- inner_join(map.df, Append, by = "id")

# Set up basic components of the plot
gg.base <- ggplot(data = NULL, aes(x = long, y = lat, group = group)) +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    theme_void() + 
    theme(legend.position = "bottom") +
    guides(fill = guide_colourbar(barwidth = 15)) +
    coord_map() # Keeps x-y ratio consistent

# Add visible layers for borders and fill colours
gg.obs <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = obs), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.obs, 
        values = Fill.values.obs.r,
        limits = range(Fill.values.obs)) +
    ggtitle("Observed Cases")

gg.SIR.raw <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.SIR, 
        values = Fill.values.SIR.r,
        limits = range(log2(Append$SIR)),
        breaks = seq(-2, 2), 
        labels = round(2^seq(-2, 2), 3)) +
    ggtitle("Observed SIR") +
    annotate("rect",
        xmin = lims$xmin - 0.01, xmax = lims$xmax + 0.01,
        ymin = lims$ymin - 0.01, ymax = lims$ymax + 0.01,
        colour = "black", fill = NA)

# Render the plots
grid.arrange(gg.obs, gg.SIR.raw, nrow = 1)

It is difficult to determine what patterns may exist in more densely populated regions where SA2s are smaller. The city insets will show this more clearly. Here are the insets for the observed SIR.

# Create inset maps
gg.inset <- vector("list", 4)
gg.base.inset <- gg.base + guides(fill = FALSE, alpha = FALSE) +
    scale_fill_gradientn("", 
        colours = Fill.colours.SIR, 
        values = Fill.values.SIR.r,
        limits = range(log2(Append$SIR))
    )

# Brisbane
dat <- inner_join(map.1, Append, by = "id")
gg.inset[[1]] <- gg.base.inset + 
    geom_polygon(data = map.border.1, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Brisbane")

# Rockhampton
dat <- inner_join(map.2, Append, by = "id")
gg.inset[[2]] <- gg.base.inset + 
    geom_polygon(data = map.border.2, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Rockhampton")

# Townsville
dat <- inner_join(map.3, Append, by = "id")
gg.inset[[3]] <- gg.base.inset + 
    geom_polygon(data = map.border.3, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Townsville")

# Cairns
dat <- inner_join(map.4, Append, by = "id")
gg.inset[[4]] <- gg.base.inset + 
    geom_polygon(data = map.border.4, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Cairns")

# Render the plots
grid.arrange(
    gg.inset[[1]], 
    gg.inset[[2]],
    arrangeGrob(gg.inset[[4]], gg.inset[[3]], ncol = 1),
    ncol = 3, widths = c(10, 9, 5)
)

7.3 Fitting the Leroux model to the Incidence Data

Looking at the plots of the simulated incidence data above, there doesn't appear to be very strong spatial patterns amongst the observed values or observed SIR values, although there does appear to be some degree of spatial dependence given that the observed SIR values are typically elevated in the southeast region (Brisbane) compared to the rest of Queensland. But spatial patterns in the data are often obscured by other variables such as covariates, the expected values which acts as an offset to account for population differences, and random variation (noise). Put another way, each of these variables can be thought of as a "layer" contributing to the relative risk surface and ultimately to the observed values. The layer which accounts for the spatial autocorrelation after accounting for each of the other layers should reveal a spatial pattern with some degree of smoothness (spatial autocorrelation). When we fit a spatial model to this data, the spatial smoothing parameter is estimating this layer. This layer can be thought of as the underlying spatial random field, and can be just as interesting as the relative risk surface. By fitting a Bayesian model, we should be able to estimate these individual layers comprising the relative risk surface.

Here we demonstrate how to fit the preferred incidence model from the ACA: the Leroux model (refer to Section 5.2.3). For this particular model, there are numerous software options that can be used to estimate the parameters. One of the easiest methods is using MCMC estimation in the CARBayes R package (Lee 2013).

Note that CARBayes requires a spatial weights matrix as an input into the model. We will use the spdep package again to create the binary, first-order, adjacency weights matrix (refer to Sections 4.7 and 4.8). But recall the area IDs that we saved earlier for areas with zero or small residential population - we need to exclude these areas from the adjacency weights matrix as well.

# Create spatial weights matrix, W
W <- nb2mat(poly2nb(map), style = "B", zero.policy = TRUE)

# Remove rows/columns for excluded areas
W <- W[-SA2.no.pop, -SA2.no.pop]

# Summarise output
image(W)    # red = 0, white = 1

Unfortunately, there are some areas with no immediately adjacent neighbours. This creates a problem when trying to fit a spatial model. (If you run the second code chunk below now, CARBayes will encounter a run-time error as a result). There are two solutions.

  1. For the areas without neighbours, set the spatial smoothing parameter to zero, and only estimate it for the remaining areas.
  2. Construct artificial neighbours for these areas by linking them to one or more other areas. These links could be based on tangible connections like bridges, or less tangible connections like ferry and air transport terminals, or simply the nearest (non-adjacent) neighbour.

The second option is generally easier, and providing links are sensible, is arguably better since it provides some spatial smoothing for all areas. We will use the second option here by using the nearest neighbour approach.

# Which areas have no neighbours? 
no.neighbours <- which(rowSums(W) == 0)

# Determine which areas are closest to these areas
nearest.area <- rep(NA, length(no.neighbours))
for(i in 1:length(no.neighbours)){
    distances <- d[no.neighbours[i],]
    nearest.area[i] <- which(distances == distances[order(distances)][2])
    
    # Update W
    W[no.neighbours[i], nearest.area[i]] <- 1
    W[nearest.area[i], no.neighbours[i]] <- 1   # Keep W symmetric
}

Now we are ready to fit the model. Note that the dataset includes a covariate. The reason for including a covariate may be to control for a known or suspected risk factor, so that residual spatial patterns can be estimated. In practice, this may be a socioeconomic variable, or a variable that delineates between urban and rural areas. An unadjusted model (i.e. without covariates) may also be of interest, but for the sake of illustration, we will include the simulated covariate here.

As aforementioned, CARBayes also includes a global intercept term by default, so the \(\mu_0\) term in the Leroux model is zero. Assuming some weakly informative priors, we will fit the following model: \[Y_i \sim \mbox{Po}\left(E_i e^{\mu_i}\right)\] where \(Y_i\) and \(E_i\) are the observed and expected counts respectively,

\[ \mu_i = \beta_0 + \beta_1 x_i + S_i \\ \beta_0 \sim \mathcal{N}(0, 100) \\ \beta_1 \sim \mathcal{N}(0, 100) \\ S_i \mid \bm{S}_{\smallsetminus i} \sim \mathcal{N}\left(\frac{\rho \sum_j{w_{ij}s_j}}{\rho \sum_j{w_{ij}}+1-\rho}, \frac{\sigma_s^2}{\rho \sum_j{w_{ij}}+1-\rho}\right) \\ \rho \sim \mbox{Uni}(0, 1) \\ \sigma_S^2 \sim \mathcal{IG}(1, 100) \] and the Gaussian distribution is parameterised in terms of mean and variance, and the inverse gamma distribution is parameterised in terms of shape and rate (note that CARBayes uses shape and scale = 1/rate).

# Fixed values (input data)
y <- data$observed[-SA2.no.pop]         # Observed values
E <- data$expected[-SA2.no.pop]         # Expected values
x <- data$covariate[-SA2.no.pop]        # Covariate

# MCMC parameters
M.burnin <- 20000       # Number of burn-in iterations (discarded)
M <- 5000               # Number of iterations retained (after thinning)
n.thin <- 5             # Thinning factor

# Fit model using CARBayes
set.seed(6789)          # For reproducability
MCMC <- S.CARleroux(
    formula = y ~ offset(log(E)) + x, 
    data = data.frame(y = y, E = E, x = x),
    family = "poisson",
    prior.mean.beta = c(0, 0),
    prior.var.beta = c(100, 100),
    prior.tau2 = c(1, 0.01),        # Shape and scale
    rho = NULL,
    W = W,
    burnin = M.burnin,
    n.sample = M.burnin + (M * n.thin), # Total iterations
    thin = n.thin,
    verbose = FALSE
)

# Store the list of parameters for convenience (ignore the other stuff in MCMC)
pars <- MCMC$samples
names(pars)
## [1] "beta"   "phi"    "tau2"   "rho"    "fitted" "Y"

Our results are stored in the object MCMC, and the posterior estimates of each parameter are stored in the object pars. Note that CARBayes only returns estimates of the stochastic parameters in the model (those we specified initial values for, plus \(\rho\) which is initialised automatically if no fixed value is supplied). This includes \(\beta_0, \beta_1, \sigma_S^2\), and \(\rho\). If we want to recover other parameters calculated from these, e.g. the log-relative risk \(\mu\), or the relative risk, \(\mbox{exp}(\mu_i)\), then these need to be computed manually first.

Note also that CARBayes uses different symbols for some parameters: phi is what we denote by \(S\), and tau2 is what we denote by \(\sigma_S^2\). Additionally, beta is a two-column matrix: the first column contains the estimates for \(\beta_0\) and the second column contains the estimates for \(\beta_1\). For the sake of consistency, we will explicitly rename the parameters to match our notation.

# Renaming phi as S
names(pars)[which(names(pars) == "phi")] <- "S"

# Renaming tau2 as sigma.sq.s
names(pars)[which(names(pars) == "tau2")] <- "sigma.sq.s"

# Separating beta and into beta_0 and beta_1
pars$beta.0 <- as.numeric(pars$beta[,1])
pars$beta.1 <- as.numeric(pars$beta[,2])
pars$beta <- NULL

# Recreate mu (log-relative risk)
pars$mu <- pars$beta.0 + outer(pars$beta.1, x) + pars$S

# Renamed parameters
names(pars)
## [1] "S"          "sigma.sq.s" "rho"        "fitted"     "Y"          "beta.0"     "beta.1"     "mu"

There are many ways we can summarise these results. Useful summaries include the marginal posterior distributions, which indicate the estimates of the model parameters and their uncertainty, and spatial patterns derived from the spatial parameters which we can summarise using thematic maps similar to those we created earlier.

First, lets summarise the marginal posterior distributions. Graphical summaries are useful here, but the type of plot depends on the dimensions (and interpretation) of the particular parameter. For univariate parameters, like the overall intercept \(\beta_0\), a density plot is sensible. For bivariate parameters, individual densities may be feasible for each indexed parameter if there are only a few parameters (using transparency and allowing them to overlap), otherwise a less cluttered approach is to summarise each parameter with a point and interval estimate (e.g. median and 90% credible interval), using one axis to show the index of that additional dimension. If the bivariate parameter is ordered, then it makes sense to plot the estimates in that order. For spatial parameters like \(S\) and \(\mu\), the indices are usually not that meaningful. Another option is to re-order the areas according to the magnitude of the point estimate. This emphasises extreme values (at the tails) and makes the plot much clearer, especially when there are a large number of areas. Additionally, the relative risk (i.e. SIR) is best interpreted on a ratio scale rather than a linear scale. The easiest way to accomplish this is to plot the log-relative risk and adjust the axis labels to reflect the exponentiated values.

# Total number of sub-plots
gg.post <- vector("list", 6)

# Plot for beta_0
dat <- data.frame(val = pars$beta.0)
gg.post[[1]] <- ggplot(dat, aes(x = val)) + 
    geom_density(fill = "black", alpha = 0.5, color = NA) +
    scale_x_continuous(expression(beta[0])) +
    scale_y_continuous("")

# Plot for beta_1
dat <- data.frame(val = pars$beta.1)
gg.post[[2]] <- ggplot(dat, aes(x = val)) + 
    geom_density(fill = "black", alpha = 0.5, color = NA) +
    scale_x_continuous(expression(beta[1])) +
    scale_y_continuous("")

# Plot for S
dat <- pars$S
mean <- apply(dat, 2, mean)
CI.lower <- apply(dat, 2, quantile, probs = 0.025)
CI.upper <- apply(dat, 2, quantile, probs = 0.975)
rank <- order(order(mean))
dat <- data.frame(id = rank, mean, CI.lower, CI.upper)
gg.post[[3]] <- ggplot(dat, aes(x = id, y = mean)) + 
    geom_pointrange(aes(ymin = CI.lower, ymax = CI.upper), 
        alpha = 0.3, fatten = 0.1, colour = "grey50") +
    geom_point(size = 1) +
    scale_x_continuous("Ranked SA2", breaks = seq(0, N, 50)) +
    scale_y_continuous(expression(S)) +
    coord_flip()

# Plot for relative risk (SIR)
SIR.labels <- c(0.25, 0.5, 1, 2, 4) # Ratio scale labels
dat <- pars$mu
mean <- apply(dat, 2, mean)
CI.lower <- apply(dat, 2, quantile, probs = 0.025)
CI.upper <- apply(dat, 2, quantile, probs = 0.975)
rank <- order(order(mean))
dat <- data.frame(id = rank, mean, CI.lower, CI.upper)
gg.post[[4]] <- ggplot(dat, aes(x = id, y = mean)) + 
    geom_pointrange(aes(ymin = CI.lower, ymax = CI.upper), 
        alpha = 0.3, fatten = 0.1, colour = "grey50") +
    geom_point(size = 1) +
    scale_x_continuous("Ranked SA2", breaks = seq(0, N, 50)) +
    scale_y_continuous("SIR", breaks = log(SIR.labels), labels = SIR.labels) +
    coord_flip(ylim = log(range(SIR.labels)))

# Plot for rho
dat <- data.frame(val = pars$rho)
gg.post[[5]] <- ggplot(dat, aes(x = val)) + 
    geom_density(fill = "black", alpha = 0.5, color = NA) +
    scale_x_continuous(expression(rho)) +
    scale_y_continuous("")

# Plot for sigma^2
dat <- data.frame(val = pars$sigma.sq.s)
gg.post[[6]] <- ggplot(dat, aes(x = val)) + 
    geom_density(fill = "black", alpha = 0.5, color = NA) +
    scale_x_continuous(expression(var[S])) +
    scale_y_continuous("")

# View combined plots
grid.arrange(
    gg.post[[1]], gg.post[[2]], gg.post[[3]], gg.post[[4]], gg.post[[5]], gg.post[[6]], 
    ncol = 3
)

Now for the spatial patterns. Here we ignore the uncertainty and just plot the point estimates. (For the ACA, we include a feature which allows the user to toggle an additional layer which incorporates the uncertainty into the colours shown). We can decompose the log-risk surface (\(\mu_i\)) into its components: the spatial random effect, \(S_i\), and the covariate effect, \(\beta_1 x_i\). The intercept can also be visualised as a spatial layer, albeit each area in this layer will have the same value.

log.SIR <- apply(pars$mu, 2, median)

# For clarity, truncate extreme SIR values
clip <- c(0.25, 4)  # Not necessary in this example since all values are within this range
SIR.clipped <- exp(log.SIR)
SIR.clipped[which(SIR.clipped < clip[1])] <- clip[1]
SIR.clipped[which(SIR.clipped > clip[2])] <- clip[2]

# Dataframe of the parameters we need to append to the shapefile
Append <- data.frame(
    id = (1:N)[-SA2.no.pop],
    int = median(pars$beta.0) + rnorm(N-length(SA2.no.pop), 0, 0.001),
    S = apply(pars$S, 2, median),
    cov = median(pars$beta.1) * x,
    log.SIR = log.SIR,
    SIR = SIR.clipped
)

# Merge values with shapefile dataframe
dat <- inner_join(map.df, Append, by = "id")

gg.base <-  ggplot(data = NULL, aes(x = long, y = lat, group = group)) +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    theme_void() + 
    theme(legend.position = "bottom") +
    guides(fill = guide_colourbar(barwidth = 8)) +
    coord_map() # Keeps x-y ratio consistent

# Fill colours
Fill.colours.linear <- c("#00bbcc", "#66ffff", "white", "#d580ff", "#8600b3")
Fill.colours.SIR <- c("#2C7BB6", "#ABD9E9", "#FFFFBF", "#FDAE61", "#D7191C")

# Range of linear components values (for consistent colour legend)
Lims <- abs(range(Append[,2:5]))
Lims <- Lims[which.max(Lims)]
Fill.values.linear <- seq(-Lims, Lims, length = 5)

# Range of (truncated) ratio scale values and breaks
Lims <- log2(range(Append$SIR))
Fill.values.SIR <- seq(log2(1/1.5), log2(1.5), length = 5)
if(Lims[2] > Fill.values.SIR[5]){
    Fill.values.SIR <- c(Fill.values.SIR, Lims[2])
    Fill.colours.SIR <- c(Fill.colours.SIR, Fill.colours.SIR[5])
}
if(Lims[1] < Fill.values.SIR[1]){
    Fill.values.SIR <- c(Lims[1], Fill.values.SIR)
    Fill.colours.SIR <- c(Fill.colours.SIR[1], Fill.colours.SIR)
}

# Rescaled fill values
Values.1 <- rescale(Fill.values.linear, from = range(Append$int))
Values.2 <- rescale(Fill.values.linear, from = range(Append$S))
Values.3 <- rescale(Fill.values.linear, from = range(Append$cov))
Values.4 <- rescale(Fill.values.linear, from = range(Append$log.SIR))
Values.5 <- rescale(Fill.values.SIR, from = range(log2(Append$SIR)))

Breaks.SIR <- c(0.5, 1/1.5, 1, 1.5, 2)

gg.1 <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = int), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.linear, 
        values = Values.1,
        breaks = round(median(pars$beta.0), 3)) +
    ggtitle("Intercept")

gg.2 <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = S), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.linear, 
        values = Values.2,
        breaks = seq(-10, 10, 0.2)) +
    ggtitle("Spatial Random Effect")

gg.3 <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = cov), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.linear, 
        values = Values.3,
        breaks = seq(-10, 10, 0.25)) +
    ggtitle("Covariate Effect")

gg.4 <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log.SIR), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.linear, 
        values = Values.4,
        breaks = seq(-10, 10, 0.25)) +
    ggtitle("Log-SIR")

gg.5 <- gg.base + 
    geom_polygon(data = map.border, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    scale_fill_gradientn("", 
        colours = Fill.colours.SIR, 
        values = Values.5,
        breaks = log2(Breaks.SIR), 
        labels = round(Breaks.SIR, 3)) +
    ggtitle("SIR") +
    annotate("rect",
        xmin = lims$xmin - 0.01, xmax = lims$xmax + 0.01,
        ymin = lims$ymin - 0.01, ymax = lims$ymax + 0.01,
        colour = "black", fill = NA)

# View combined plots
grid.arrange(
    arrangeGrob(gg.1, gg.2, gg.3, gg.4, gg.5, nrow = 1)
)

Here are the city insets for the SIR estimates:

# Create inset maps
gg.inset <- vector("list", 4)
gg.base.inset <- gg.base + guides(fill = FALSE, alpha = FALSE) +
    scale_fill_gradientn("", 
        colours = Fill.colours.SIR, 
        values = Values.5,
        limits = range(log2(Append$SIR))
    )

# Brisbane
dat <- inner_join(map.1, Append, by = "id")
gg.inset[[1]] <- gg.base.inset + 
    geom_polygon(data = map.border.1, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Brisbane")

# Rockhampton
dat <- inner_join(map.2, Append, by = "id")
gg.inset[[2]] <- gg.base.inset + 
    geom_polygon(data = map.border.2, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Rockhampton")

# Townsville
dat <- inner_join(map.3, Append, by = "id")
gg.inset[[3]] <- gg.base.inset + 
    geom_polygon(data = map.border.3, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Townsville")

# Cairns
dat <- inner_join(map.4, Append, by = "id")
gg.inset[[4]] <- gg.base.inset + 
    geom_polygon(data = map.border.4, fill = "grey30", color = "black", size = 0.8) +
    geom_polygon(data = dat, aes(fill = log2(SIR)), color = NA) +
    ggtitle("Cairns")

# Render the plots
grid.arrange(
    gg.inset[[1]], 
    gg.inset[[2]],
    arrangeGrob(gg.inset[[4]], gg.inset[[3]], ncol = 1),
    ncol = 3, widths = c(10, 9, 5)
)

Note that the dark grey areas correspond to the SA2s with no population. (We did not estimate the SIR for these areas; in practice, we probably would not have had an observed SIR either.)

To compare how much smoothing has actually occurred, we could plot the observed SIRs against the smoothed SIR estimates. Alternatively, we could plot both the SIRs against the SA2 IDs, ranked according to the magnitude of one of the SIRs. This is illustrated in the following plot, where the SA2s are ranked by the observed SIR.

# Truncate observed and smoothed SIR values
clip <- c(0.2, 10)     # Wider limits to show major deviations from raw SIR
raw.SIR <- data$raw.sir[-SA2.no.pop] %>% 
    replace(., which(. > clip[2]), clip[2]) %>%
    replace(., which(. < clip[1]), clip[1])
SIR <- exp(log.SIR) %>% 
    replace(., which(. > clip[2]), clip[2]) %>%
    replace(., which(. < clip[1]), clip[1])

# Fill colours
Fill.colours <- c("#2C7BB6", "#2C7BB6", "#ABD9E9", "#FFFFBF", "#FDAE61", "#D7191C", 
    "#D7191C")
Fill.values <- c(log2(1/4.5), seq(log2(1/1.5), log2(1.5), length = 5), log2(4.5))
Fill.values.r <- rescale(Fill.values, na.rm = TRUE)

Breaks.SIR <- c(0.25, 0.5, 1/1.5, 1, 1.5, 2, 4)

dat <- data.frame(
    id = (1:N)[-SA2.no.pop],
    SIR = SIR,
    raw.SIR = raw.SIR
)
dat$rank <- order(order(dat$raw.SIR))

ggplot(dat) +
    geom_hline(yintercept = 1, size = 1, colour = grey(0.3)) +
    geom_point(aes(x = rank, y = raw.SIR, colour = log2(raw.SIR), shape = "Observed"), 
        size = 3) +
    geom_point(aes(x = rank, y = SIR, shape = "Smoothed"), size = 1.5) +
    scale_x_continuous("Ranked SA2", expand = c(0.01, 0.01)) +
    scale_y_log10("", breaks = Breaks.SIR,
        labels = round(Breaks.SIR, 3),
        expand = c(0.01, 0.01)) +
    theme(panel.grid.minor.y = element_blank()) +
    scale_colour_gradientn("",
        colours = Fill.colours,
        values = Fill.values.r,
        breaks = log2(Breaks.SIR),
        labels = round(Breaks.SIR, 3),
        limits = range(Fill.values)
    ) +
    scale_shape_manual("",
        breaks = c("Observed", "Smoothed"),
        values = c("Observed" = 17, "Smoothed" = 16)
    ) +
    guides(
        colour = guide_colourbar(barheight = unit(8, "cm"))
    )

This plot reveals several things about the spatial smoothing.

  • The overall effect is clear: the observed SIRs (coloured triangles) are generally smoothed towards the average (SIR = 1). The estimated SIRs all lie within the range 0.6 to 1.3 (range(SIR)), while the observed SIRs are generally more extreme.
  • However, in some cases, the SIR is smoothed further away from 1 or 'flips' sides, i.e. smoothed towards 1 and beyond. This is because the values are smoothed towards the mean of the neighbouring SIR values (local mean), as specified by the Leroux CAR prior. (For a more detailed explanation of this behaviour, see Duncan and Mengersen (2020).)
  • The grey triangles (bottom-left) represent SA2s with a corresponding observed SIR value of zero. This is very unlikely a reflection of the true SIR for these areas. It is not surprising that these SIRs experience a great deal of smoothing.