Seminar 25/11: ‘Explainable Machine Learning for Cervical Cancer Risk Factors Assessment’

On November 25th, Sultan Imangaliyev presented his recent work which uses interpretable machine learning methods predict individual patient’s risk of cervical cancer. An abstract for Sultan’s talk is below.

Despite the possibility of prevention with cytological screening, cervical cancer remains a significant cause of more than half a million cases per year, killing more than a quarter of a million in the same period, because patients usually have limited access to accurate routine screening. Consequently, the prediction of the individual patient’s risk and the best screening strategy during her diagnosis becomes a fundamental problem in which machine-learning methods can provide an efficient solution to save lives.

Although many complex machine-learning approaches provide good prediction accuracy, their application in an actual public health setting is limited because their predictions are difficult to interpret and hence not actionable. In contrast to black-box models, interpretable methods explain why a certain prediction was made for a patient, i.e. such models can help to pin-point which of the specific patient characteristics have led to the prediction.

Explaining predictions from tree ensemble methods such as gradient boosting machines is often heuristic and not individualized for each prediction. To address it we turn to recent applications of game theory and apply fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique, consistent and locally accurate attribution values. SHAP provides a rich visualization of individualized feature attributions that improves over classic attribution summaries such as feature importance plots. We demonstrate better agreement with human intuition through a user study on a real-world dataset combined with better identification of influential features.

Slides for Sultan’s talk are available here.

Leave a comment