Hi, I am Théo Sourget

Théo Sourget

Research Assitant at IT University of Copenhagen - PURRlab.

Passionate about Data Science and AI, I graduated in 2023 with a Master’s degree in Data Science at the Université de Rouen. During my studies I’ve specialised myself in Medical Image Analysis with projects focusing on the classification and segmentation of medical images using deep learning models. These projects were also about training transformer models with small datasets and the effect of Data Augmentation and Transfer learning. I’m now a Research Assistant of the PURRlab at the IT University of Copenhagen



IT University of Copenhagen - PURRlab

October 2023 - Present, Copenhagen (Denmark)

Research Assistant

October 2023 - Present

  • Study the usage of non-relevant information (e.g. background) in chest X-ray images by CNN models
  • Studied how public medical image datasets are referenced in research papers
Assistant Lecturer

January 2024 - Present

  • Prepare and teach lectures
  • Prepare exercises sessions
Teaching Assistant

October 2023 - Dec 2023

  • Help students during practical work

Data scientist intern
Capgemini Engineering - Medic@

April 2023 - September 2023, Illkirch-Graffenstaden (France)

  • Detection, segmentation and numbering of teeth in dental panoramic x-ray.
  • Comparison of Mask-RCNN and Detection Transformer (DETR).
  • Comparison of “classical” data augmentation technics with augmentation using the generation of new panoramics.


April 2021 - July 2021, Vannes (France)


June 2021 - July 2021

  • Continuation of the intership
  • Creation of a dashboard with Qlik Sense using the previously created API
Developer Intern

April 2021 - June 2021

  • Development of a storage-related data analysis website with Python and Streamlit
  • Deployement with Docker
  • Presentation of the tool to the team


Master Data Science and Engineering (SID)
Grade: 16.215 out of 20 (Valedictorian, Graduated with Highest Honors)
Université de Rouen
Bachelor's degree in Computer Science and Data Science
Grade: 14.958 out of 20 (Valedictorian, Graduated with Honors)
IUT de Vannes
2 years diploma in Computer science
Grade: 13.835 out of 20


PDF Annotator

Web APP to annotate PDF File. The files can be annotate by multiple users for up to two initial sets of labels.

Citation finder

Website multiple API such as OpenAlex to search for papers referencing another one and papers matching keywords and concept.

Comparaison of UNet and SegFormer for medical image segmentation (Master 2 project)
2022 - 2023

Comparison of Segformer and U-Net to perform semantic segmentation on the CAMUS dataset (cardiac ultrasound images).

Glaucoma Detection with CNN (Master 1 project)
2021 - 2022

Classification of eye fundus images for glaucoma detection with convolutional neural network.

Bachelor's degree project
2020 - 2021

Analysis of a dataset on the 2011-2012 season of the Premier League.

Astronomical observation website

Development of an astronomical observation website with React-Flask-MongoDB.


Development of a website and a mobile application for sports management with React/React Native.


Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In this work, we present two open-source tools we created that could help with the detection of dataset usage, a pipeline using OpenAlex and full-text analysis, and a PDF annotation software used in our study to manually label the presence of datasets. We applied both tools on a study of the usage of 20 publicly available medical datasets in papers from MICCAI and MIDL. We compute the proportion and the evolution between 2013 and 2023 of 3 types of presence in a paper: cited, mentioned in the full text, cited and mentioned. Our findings demonstrate the concentration of the usage of a limited set of datasets. We also highlight different citing practices, making the automation of tracking difficult.

Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques

In this work, we focused on deep learning image processing in the context of oral rare diseases, which pose challenges due to limited data availability. A crucial step involves teeth detection, segmentation and numbering in panoramic radiographs. To this end, we used a dataset consisting of 156 panoramic radiographs from individuals with rare oral diseases and labeled by experts. We trained the Detection Transformer (DETR) neural network for teeth detection, segmentation, and numbering the 52 teeth classes. In addition, we used data augmentation techniques, including geometric transformations. Finally, we generated new panoramic images using inpainting techniques with stable diffusion, by removing teeth from a panoramic radiograph and integrating teeth into it. The results showed a mAP exceeding 0,69 for DETR without data augmentation. The mAP was improved to 0,82 when data augmentation techniques are used. Furthermore, we observed promising performances when using new panoramic radiographs generated with inpainting technique, with mAP of 0,76.

The U-Net model, introduced in 2015, is established as the state-of-the-art architecture for medical image segmentation, along with its variants UNet++, nnU-Net, V-Net, etc. Vision transformers made a breakthrough in the computer vision world in 2021. Since then, many transformer based architectures or hybrid architectures (combining convolutional blocks and transformer blocks) have been proposed for image segmentation, that are challenging the predominance of U-Net. In this paper, we ask the question whether transformers could overtake U-Net for medical image segmentation. We compare SegFormer, one of the most popular transformer architectures for segmentation, to U-Net using three publicly available medical image datasets that include various modalities and organs with the segmentation of cardiac structures in ultrasound images from the CAMUS challenge, the segmentation of polyp in endoscopy images and the segmentation of instrument in colonoscopy images from the MedAI challenge. We compare them in the light of various metrics (segmentation performance, training time) and show that SegFormer can be a true competitor to U-Net and should be carefully considered for future tasks in medical image segmentation.

