Researchers from the University of Oxford and the University of Nottingham have developed CanPredict: an innovative tool that predicts a person’s risk of developing lung cancer within a decade using existing health records. CanPredict identifies high-risk individuals for earlier screening, which saves time, money, and lives.

Lung cancer is the second most common cancer and the leading cause of cancer deaths. Early screening significantly improves survival rates. CanPredict works by examining existing patient health records and can be run on a per GP surgery basis or nationally, automatically prioritising patients and alerting their GPs for further screening. The tool was developed and tested using anonymised health records of over nineteen million adults across the UK and has demonstrated better sensitivity than current recommended methods in predicting lung cancer risk.

Author profile picture

You can see Aafke Eppinga in the weekly explainers, the innovation videos she makes during her cycling trip and of course on IO-tv. She likes to write, but filming and editing videos is even more fun. Luckily she can do both for IO.

Developing CanPredict: A New Era in Lung Cancer Screening

CanPredict was developed using two separate sets of health record data: the QResearch Database and the Clinical Practice Research Datalink (CPRD). The QResearch Database contains anonymised health records of over 35 million patients across the UK, spanning all ethnicities and social groups. The researchers identified 13 million people aged between 25 and 84, among whom 73,380 had a diagnosis of lung cancer. They then examined their health records to identify common factors that might be used to statistically predict their risk of developing the cancer. Factors such as smoking, age, ethnicity, body mass index, medical conditions, and social deprivation (and others) were considered as part of the analysis.

Once CanPredict was developed, it was tested using a separate set of anonymised GP health records from the CPRD, consisting of data from an additional 2.54 million people’s health records. The researchers checked which individuals their new tool predicted were at the greatest risk of developing lung cancer and compared this to those who actually went on to develop the disease. CanPredict correctly identified more people who developed lung cancer and demonstrated better sensitivity than current recommended methods for predicting risk across five-, six-, and ten-year forecasts.

Improving Early Diagnosis: The Benefits of CanPredict

Professor Julia Hippisley-Cox, Senior Author of the study, stated that improving early diagnosis of lung cancer is incredibly important not only for the NHS but especially for patients and their families[1]. CanPredict can help prioritise patients for screening and spot lung cancer earlier when treatments are more likely to help. This new validated risk tool has the potential to save lives and improve patient experience by streamlining the administrative process.

Furthermore, CanPredict could significantly reduce the burden on NHS staff, saving time and money. Dr Weiqi Liao, Lead Author on the publication, explained that CanPredict examines existing patient health records and can be run on a per GP surgery basis or nationally, automatically and objectively prioritising patients and alerting their GPs that they might benefit from further screening.

Future Implementation and Comparisons with Other Models

The researchers plan to make CanPredict publicly available for use, subject to further funding for implementation in day-to-day practice and to ensure Medicines and Healthcare Products Regulatory Agency (MHRA) medical device compliance. The full paper, titled “Predicting the future risk of lung cancer: development, and internal and external validation of the CanPredict (lung) model in 19.67 million people and evaluation of model performance against seven other risk prediction models,” can be read in The Lancet Respiratory Medicine[1].

A systematic review of 25 lung cancer risk prediction models, including CanPredict, epidemiological models, clinical assessment models, and the 2-stage clonal expansion model, found that epidemiological models had more external validation than clinical assessment models. Discrimination (area under the curve) ranged from 0.57 to 0.879, with varying calibration. The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial 2012 Model Version (PLCOM2012) and Hoggart models performed best overall, but they require further validation. Future research should test multiple models on the same data set, considering sensitivity, specificity, model accuracy, and positive predictive values at optimal risk thresholds.

Sources Laio used to write this article:

New tool uses existing health records to predict people’s risk of developing lung cancer within the next 10 years.
Risk Prediction Models for Lung Cancer: A Systematic Review.