\( \newcommand{\bm}[1]{\boldsymbol{#1}} \)

Chapter 1 Introduction

1.1 Purpose

The purpose of this eBook is to provide a practical guide to epidemiologists and other researchers who are interested in developing a cancer atlas. The methodology presented herein comprises fully Bayesian spatial models and novel visualisations, with the aim of presenting the modelled estimates as a useful, fine-resolution atlas.

The majority of the content contained within this eBook was originally presented to the New Zealand Ministry of Health (https://www.health.govt.nz/) as a short course from 19-22 November 2018. Since then, the content has been expanded and generalised with a more general audience in mind. The main modifications have been the addition of theory and explanations.

Some of the methods, examples, and guidance pertains specifically to the Australian Cancer Atlas (ACA) (https://atlas.cancer.org.au/), a project which was released in September 2018. This guide should help researchers overcome some of the same technical challenges and dilemmas faced by the authors in developing the Australian Cancer Atlas.

1.2 Software

R is the predominant software used throughout this eBook. The authors have included relevant R code to assist the reader in reproducibility. This code should run by copying and pasting into R. However, some functions require packages to be installed and loaded. The following is a list of the R packages that need to be loaded (using the library function), and the functions that rely on them.

library(dplyr)          # For %>%, arrange, mutate(), select(), summarise(), filter(), 
                        #    group_by(), distinct(), left_join(), inner_join(), etc.
library(ggplot2)        # For ggplot()
library(maps)           # For map()
library(sf)             # For st_read(), st_as_sf(), st_make_valid(), st_touches(), 
                        #    st_centroid(), st_drop_geometry(), st_distance(), st_crs(), 
                        #    st_transform()
library(tidyr)          # For nest()
library(rmapshaper)     # For ms_simplify()   (optional)
library(readxl)         # For read_xlsx()
library(scales)         # For rescale()
library(gridExtra)      # For grid.arrange()
library(spdep)          # For poly2nb(), nb2mat(), nb2WB()
library(CARBayes)       # For S.CARbym(), S.CARleroux(), etc.
library(SpatialEpi)     # For pennLC data set
library(leaflet)        # For leaflet(), colorNumeric()
library(plotly)         # For ggplotly()
library(survival)       # For Surv(), survfit()
library(survminer)      # For ggsurvplot()
library(KMsurv)         # For lifetab()
library(popEpi)         # For get.yrs(), popmort and sire data sets
library(Epi)            # For Lexis()
library(doBy)           # For summaryBy()
library(splitstackshape) # For expandRows
library(R2OpenBUGS)      # For bugs(), write.model()

Additionally, there is a useful R package on Github which greatly facilitates the downloading and loading of spatial objects for mapping, used in Chapter 7. This package is optional (alternative code is provided). To install this package (run first time only), use install_github.

# RUN ONCE
library(remotes)
remotes::install_github("runapp-aus/strayr")

This method of installation allows data to be loaded on-the-fly rather than pre-downloading all the spatial assets. To install the full absmapsdata package and for further documentation, visit https://github.com/wfmackey/absmapsdata.

In Chapter 8 and Section 9.9.5, OpenBUGS is used to estimate the parameters in a Bayesian model. OpenBUGS is stand-alone software that needs to be installed prior to invoking R2OpenBUGS::bugs() which will run OpenBUGS in the background. To install OpenBUGS, follow the instructions provided on the University of Cambridge website. For Debian based Linux distributions, OpenBugs can be downloaded from here.

1.3 Structure of the Book

Chapter 2 provides an overview of the Australian Cancer Atlas (ACA), covering key aspects relating to administration, funding and support, the digital architecture, and important statistical methodology. The content and advice presented throughout this eBook are largely based on the ACA and the authors’ experiences while developing it.

Chapter 3 provides an overview of Bayesian statistical modelling and computation, which was the approach used in the ACA. Although no knowledge of Bayesian statistics is assumed, an understanding of fundamental concepts in statistical modelling and terminology will be beneficial to the reader. This chapter should equip the reader with enough knowledge to understand the theory behind this modelling approach, code and implement simple Bayesian models using freely available software, and interpret the model output.

Chapter 4 explains the concept of a spatial model in broad terms and specifically in the Bayesian context. A generic, fully Bayesian model for disease mapping is outlined. As a specific example, the commonly used conditional autoregressive (CAR) model is also described. A brief discussion of the spatial weights and hyperpriors follow. The chapter ends with two easy-to-follow demonstrations of using the R package CARBayes to implement a spatial model and produce some interactive visualisations.

Chapter 5 describes specific examples of spatial models that may be useful for modelling cancer incidence. These models were explored as part of the initial investigation into spatial models for use in the ACA. The chapter ends with results from a simulation study that was conducted as part of this investigation.

Chapter 6 discusses each of the criteria used to compare the models in the simulation study from the previous chapter. These criteria cover different characteristics of the models, including model goodness-of-fit, computation time, convergence, and the plausibility of the credible intervals.

Chapter 7 demonstrates how to fit the preferred spatial model emanating from the ACA investigation described in Chapter 5 to simulated incidence data. Some useful visualisations for making inference are also provided.

Chapter 8 follows a similar structure to the previous chapter. New Zealand was chosen as the region for this simulated data, both for variety and for the benefit of the participants for whom this content was originally developed. Other key differences to Chapter 7 include the use of a different spatial model and alternative software to estimate the parameters.

Chapter 9 turns the focus from incidence to survival - another important measure of cancer patient care. This chapter explains how survival can be estimated, the difference between all-cause survival and net survival and their implied assumptions, and how to interpret results. A generic Bayesian model formulation for modelling relative survival is outlined, followed by two examples, including the model used in the ACA. The chapter concludes with several code-intensive worked examples, including a small-area relative survival model.

Chapter 10 briefly describes the main lessons learnt from the development of ACA.

Chapter 11 discusses the visualisations used in the ACA with a focus on how these tools can be used to visualise uncertainty that is inherent from the statistical treatment of parameters. In particular, three novel visualisations that were developed for the ACA to address two different aspects of uncertainty are outlined: the wave plot, the V-plot, and a transparency feature on the maps. The chapter concludes with a comprehensive exposition on the process of development of the visualisation components.

Chapter 12 uses specific examples from the ACA to demonstrate how incidence and survival estimates can be used to better understand the reasons for geographical patterns, and how these inferences can be communicated effectively to diverse target audiences.