Multiple PhDs and postdocs on evaluation metrics for multimodal health data
The PURRlab (Pattern Recognition Revisited lab) at the IT University of Copenhagen invites highly motivated individuals to apply for a PhD or a 2 year Postdoc positions starting late 2025 or in 2026. The earliest possible start date is 1 October 2025.
The PURRlab (Pattern Recognition Revisited lab) at the IT University of Copenhagen invites highly motivated individuals to apply for a PhD or a 2 year Postdoc positions starting late 2025 or in 2026. The earliest possible start date is 1 October 2025.
The project is funded by the Novo Nordisk Foundation Data Science Ascending Investigator grant titled "CHEETAH: CHallenges of Evaluating Teams and Algorithms" and is led by Associate Professor Veronika Cheplygina.
Project description
Machine learning (ML) competitions are often touted as drivers of algorithm development in healthcare but face limitations in real-world applications. An example competition is detecting lung cancer in chest images, where the team correctly identifying the most images with cancer wins the competition. Such competitions attract many international teams with monetary or prestigious incentives. While competitions are said to spur innovation, they often result in too similar algorithms that only excel on a specific accuracy metric, but are not robust and fail to generalize to diverse, real-world data.
I posit that a single performance metric such as accuracy is insufficient to capture algorithm robustness, for example how the algorithm performs on rare patient cases. Having a single performance metric also leads to too similar algorithms which do not bring added value despite their high training costs and carbon footprint. Furthermore, as research on women and other underrepresented groups in computer science shows, competition may deter them from entering or staying in the field.
I therefore propose to design competitions with multiple metrics, both in what the metric measures (e.g. accuracy or sensitivity) and which subgroups of patients this is measured on. My team will focus on two multimodal disease (risk) prediction data: chest x-rays with radiology reports and retinal images with tabular clinical measurements. Inspired by techniques like generative models, transfer learning and data distillation we will first develop novel methods to evaluate and increase the diversity of the evaluation data. We will then design how to evaluate similarity of algorithms, and develop methods to combine and reuse (parts of) them, such that robustness can be increased without the disproportionate carbon footprint. Finally, we will organize competitions in education and at conferences, where we will study how the novel design affects underrepresented groups in data science.