Mask of truth: model sensitivity to unexpected regions of medical images

1IT University of Copenhagen, 2Copenhagen University Hospital, Herlev and Gentofte, 3Radiological AI Testcenter, 4Zealand University Hospital Department of Radiology

Abstract

In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image.

We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an Area Under the Curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI.

We also reveal a possible spurious correlation in the Chákṣu dataset while the performances are more aligned with the expectation of an unbiased model.

We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge.

Masking out the ROI and evaluation

Chest X-rays

Data

We use the PadChest dataset, a multi-class and multi-label dataset with more than 160,000 images containing annotations for 174 labels. We train the model for five classes: cardiomegaly, pneumonia, atelectasis, pneumothorax and effusion. We remove the lungs from the images using the mask provided by the CheXmask dataset.

The different types of images depending on the masked area.

The five types of images for the PadChest dataset

Good performances on all type of images

For each type of masking we train a classifier and evaluate it on all the masking types. For all classes, models are able to obtain good performances when trained and evaluated on the same type of images, even without the ROI.


Obvious and non-obvious shortcuts

We apply SHAP to evaluate the element used by the model to make the classification. We see the model focusing on visible shortcuts such as (e.g pacemaker) but also at the relevant part of the image (e.g the heart). Without the ROI, the model is focusing on area at the border of the image that doesn't containt obvious shortcuts.

SHAP values of different chest x-rays.

Example of SHAP values for different images


Eye fundus

Data

We use the Chákṣu dataset, a dataset for the binary classification of glaucoma with 1345 images acquired with three different fundus cameras. We remove the optic disc in the image using the mask provided with the Chákṣu dataset.

The different types of images depending on the masked area.

The five types of images for the Chákṣu dataset


No signs of shortcut learning from the masking...

For this dataset, the low performances on images without the ROI are more aligned with the expectation of models not impacted by shortcut learning.

AUCs for the glaucoma class

...But a potential shortcut within the region of interest

Grounded in medical knowledge, we show that the size of the optic disc may be a shortcut as it is for real clinicians. On model trained with only the optic disc, increasing the size of the mask for only one class (glaucoma in orange or healthy in blue) to simulate larger optic disc reveals a potential shortcut within the ROI.

Evolution of auc when increasing the mask size.

See more in the paper

Check-out the full paper for additional results on the application of the models on Out-Of-Distribution data, showing their poor generability. Furthermore, the study with a radiology resident reveals the difficulty of making a diagnosis only from the images, confirming the need for additional information from other modalities (e.g radiology report). Finally, we also discuss the limitation of explainability methods and embeddings analysis.

BibTeX

@article{sourget2025mask,
          title={Mask of Truth: Model Sensitivity to Unexpected Regions of Medical Images},
          author={Sourget, Th{\'e}o and Hestbek-M{\o}ller, Michelle and Jim{\'e}nez-S{\'a}nchez, Amelia and Junchi Xu, Jack and Cheplygina, Veronika},
          journal={Journal of Imaging Informatics in Medicine},
          pages={1--18},
          year={2025},
          publisher={Springer}
        }