Edinburgh Cancer Informatics are part of a large partnership of Universities and learning centres from across Europe, taking part in a mass data standardisation programme to investigate cancer survival rates from date of diagnosis.
The aim of this European-wide project is to examine the incidence, and survival of certain cancers to see how well mathematical models can predict cancer survival 1, 5, and 10 years after the first diagnosis using real-world data. Clinical trial data is used with extrapolation techniques to estimate long-term overall survival (OS).

However, this extrapolation is often a key source of decision uncertainty, because they are forecasts of the future based only on observed data. Real-world evidence such as medical records can help to address this uncertainty. This work will improve patient care by indirectly informing public health policy by evaluating the usefulness of real-world data in survival prediction for common cancers over the lifespan.
The primary outcome of this study is the all-cause mortality, and will be focusing on the following cancers: breast, colon, lung, liver, prostate, head and neck, pancreatic, oesophagus and stomach.
We mapped our data to the OMOP standard, and other hospitals did the same so that all our data was comparable. A script was then written which was shared with every participating site. This is called a federated approach and ensures standardisation.
This study will have descriptive aspects due to the focus on defining specific prevalent cancers (breast, colorectal, lung, liver, prostate, head/neck, pancreatic, oesophagus or stomach cancer) and identifying trends in disease based on stratification of other comorbid and lifestyle factors (e.g. age, sex, etc). For the survival analyses, patients will be followed from index date (cancer diagnosis) to the last date of available data and assessed for the occurrence of death from any cause. We require this type of cohort study design as it will allow us to calculate the survival rates after 1, 5, and 10 years from index date of cancer diagnosis.
The four objectives we aim to accomplish are:
- To develop and assess data quality of phenotypes of the studied cancers.
- To estimate age-sex-specific incidence rates and prevalence of studied cancers.
- To estimate overall observed survival of the studied cancers.
- To determine how well different standard survival functions predict survival rates 1, 5 or 10-years after diagnosis
By fulfilling these objectives, we will be able to assess the incidence and prevalence of these cancers and assess their survival to be used to develop a user-friendly tool for health technology assessments.
This work is of benefit to patients, as study findings could improve patient care by indirectly informing public health policy through evaluating the usefulness of RWD in survival extrapolation for common cancers over the lifespan of patients.
The study output can be viewed here.
List of contributing sites:
University of Edinburgh, University of Oxford, IDIAP Jordi Gol, SIDiAP, Netherlands Comprehensive Cancer Organisation, Cancer Registry of Norway, Helsinki University Hospital, Hospital Universitario Virgen Macarena, Universite de Geneve, Institut d’Assistencia Sanitaria, ULSM, ULS Regiao de Aveiro, ULSGE, PCi, University Medical Center Rotterdam, University of Tartu.




