# Model for cancer diagnosis

A rigorous process of model testing and evaluation was carried out before deciding on the final model (see report).

The model selected for cancer diagnosis is the model of Leroux et al. 2000. This model is one of many models designed for count data which is a specific case of the three-stage hierarchical model proposed by Best et al. 2005 for the purpose of disease mapping. The first stage is the likelihood model for the observed count data, that is, the number of cancer diagnoses, \(Y_{i}\)

$$Y_{i} \sim \text{Poisson} \left( E_{i}e^{ \theta _{i}} \right) ~~ \text{for } i=1, \ldots ,2148 \text{ areas}$$

Where \(E_{i}\) represents the expected number of cancer diagnoses. For each SA2 of usual residence and 5-year age group from ages 15+ (15-19, 20-24,…,85+), the expected number of diagnoses were calculated as the:

$$E_{i}= \sum _{\text{age group }=1}^{15}\frac{\text{Number of cancers diagnosed in Australia}}{\text{Australian population}} \times \text{SA2 population}$$

where age group=1 represents ages 15-19 and age group=15 represents ages 85+. This used data aggregated over 10 years (2005-2014) or 5 years (2010-2014), depending on the cancer type.

The key output from this model used in the maps is the standardised incidence ratio (SIR), which is calculated as exp \((\theta _{i})\). The second stage comprises an expression for the log-SIR \(\theta _{i}\),

$$ \theta _{i}= \beta _{0}+R_{i}. $$

The parameter \(\beta _{0}\) represents the overall fixed effect (or intercept), while \(R_{i}\) represents the spatial random effect for area \(i\). The specification of these spatial random effects determines how the smoothing is performed, and together with the specification of the intercept parameter, forms the third stage of the model in the form of prior distributions:

$$ \beta _{0} \sim N \left( 0, 100000 \right) $$

$$ R_{i} \vert R_{\backslash i} \sim \mathcal{N} \left( \frac{ \rho \sum _j^{}w_{ij}R_{j}}{ \rho \sum _j^{}w_{ij}+1- \rho }, \frac{ \sigma _R^2}{ \rho \sum_j^{}w_{ij}+1- \rho } \right) \text{for } i=1, \ldots ,N \text{ areas} $$

The prior distribution for the intercept is a weakly informative Gaussian distribution with zero mean, while the random effects have a conditional distribution as proposed by Leroux et al. 2000. The random effects are a weighted sum of the neighbouring random effects, \( R_{\backslash i}=\{R_1,\ldots,R_{i-1},R_{i+1},\ldots, R_N\}\) (with weight \(\rho\)), and a global mean of \(0\) (with weight \(1-\rho\)). To complete the Bayesian model specification, the variance \(\sigma_R^2\) is also given a weakly informative prior

$$\sigma_R^2 \sim \mathcal{IG}(1, 0.01),$$

which is an inverse-gamma distribution with parameters shape and scale, and

$$\rho \sim \text{Uniform}(0, 1).$$

If \(\rho=1\), then the Leroux CAR prior reverts to the intrinsic CAR prior (i.e. complete smoothing across neighbours), whereas if \(\rho=0\) then no spatial smoothing occurs and each area is independent of its neighbours. The weights \(w_{ij}\) represent the spatial proximity between the random effects for areas \(i\) and \(j\), and collectively, they define an \(N\times N\) spatial adjacency matrix \({\bf W}\). A binary weighting scheme was used, so that if \(w_{ij}=1\) then areas \(i\) and \(j\) are considered to be neighbours, and otherwise \(w_{ij}=0\).

## References

Best N, Richardson S, Thomson A. A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research. 2005; 14(1):35-59.

Leroux BG, Lei X, Breslow N. Estimation of disease rates in small areas: a new mixed model for spatial dependence. 2000. 135-178. In Halloran ME, Berry D (Eds). Statistical models in epidemiology, the environment and clinical trials. New York: Springer.