A key feature of the Australian Cancer Atlas will be the presentation of ‘smoothed’ estimates. What are smoothed estimates? Well, hopefully, after reading this newsletter, you will have a better idea of what they are, how we generate them, and how they can be interpreted.
Why do we need spatial smoothing?
Two reasons: data privacy and statistical stability.
Data privacy relates to the responsibility of data custodians to protect the identity of individuals in their data.
Statistical stability relates to the inherent random fluctuation of statistics that occurs when there are small numbers of records; the smaller the numbers, the more they fluctuate, making correct interpretation difficult.
These issues matter when considering geographical data because of the small numbers involved. ‘Spatial smoothing’ is the answer.
What is spatial smoothing?
While standard methods of reporting disease burden typically only adjust for age and sex in each area, spatial smoothing also incorporates the geographical structure of the data. It does this by ‘borrowing’ data from the neighbouring geographical areas.
This provides greater stability to the estimates and greatly reduces any risk of individuals being identified.
These protective effects of spatial smoothing are most pronounced in areas where it is needed the most, that is, those with the smallest numbers of cases. Smoothed estimates are designed to reflect the real differences in the underlying rate or risk between areas.
The following hypothetical example shows the differences between the original data and the smoothed estimates. The red and purple areas reflect the extremely high and low values, while the yellow areas are close to the overall average.
Note how the observed values have a higher proportion of darker colours representing very high and low values, while the smoothed estimates are lighter shades, reflecting values closer to the combined average. Areas such as ● in the map above, which have changed from purple to yellow, are likely to have smaller populations, while areas keeping a more consistent colour, such as ■, or ▲, have higher numbers of cases, so are less impacted by the smoothing process.
How are the smoothed estimates calculated?
For the Australian Cancer Atlas, we are using Bayesian statistical models. The basic idea of the Bayesian method is that it uses probability distributions to describe the unknown quantities of interest. For the Atlas, these quantities are the cancer screening, incidence and survival rates in each small area. The probability distributions are calculated using the observed counts, population data and other (prior) information.
In our case, this prior information is drawn from the neighbouring geographical areas, as described earlier. The assumption is that the cancer burden or the factors that influence this burden in one geographical area are often similar to those in the neighbouring geographical areas. In this way, the Bayesian analysis enables us to supplement the data observed in each area, which leads to the more stable and robust “spatially smoothed estimates”. The probability distributions that are obtained for the cancer screening, incidence and mortality rates in each small area reflect the uncertainty of these estimates and enables more appropriate comparisons to be made between areas or with the national average.
Does one number describe the cancer burden for where I live?
No. To understand the cancer burden in a particular area you need more than just one statistic. There are many factors that contribute to how many people are diagnosed with cancer, and how long they survive – some factors are known, some are not. However, they all impact on when a cancer develops in an individual, whether that cancer is detected and diagnosed, and whether that cancer progresses and leads to death. For these reasons, the observed cancer statistics will vary from year to year, leading to some “fuzziness” around the true value. Generally, this uncertainty is higher when cancer cases are low – an important issue when considering any results from the Australian Cancer Atlas.
Another key feature of the Australian Cancer Atlas will be to describe the uncertainty around each of the cancer statistics. Since the statistical models designed within the Bayesian framework are designed to generate distributions around the estimates (as shown below), rather than a single value, they are ideally suited to quantify the level of uncertainty.
More information about how we will visually present uncertainty will be provided in a subsequent post.
Is more information available?
If you would like to delve into the technical details, we have recently published a report examining a range of Bayesian statistical models that were considered for the Australian Cancer Atlas. This report also includes details on the process of comparing and selecting the models for the Australian Cancer Atlas. It can be downloaded from https://eprints.qut.edu.au/115590/