Laure Ciernik
Accurate disease classification is crucial for timely diagnosis and tailored treatments. However, this becomes challenging with rare diseases due to limited data, high heterogeneity, and complexity. This project focuses on rare disease classification with deep neural networks. We will use models that project data into representation spaces that capture the semantic categories of diseases. Several aspects will be explored.
First, we will assess existing models and their learned representations, evaluating their characteristics and similarities and identifying desirable traits. Additionally, we aim to investigate how technical variations in medical data, such as data sources and patient characteristics known as batch effects, affect model representations and explore methods for mitigating them.
Secondly, we would like to use representation learning on large, unlabeled datasets to capture biological patterns across various conditions. This approach has been shown to facilitate knowledge transfer from common to rare diseases. Our primary focus will be on utilizing histopathological and DNA methylation data.
Finally, we will address label imbalance, which remains an issue even with good representations. Therefore, we will investigate training techniques for the downstream disease classification task that can handle this imbalance and produce well-calibrated predictions.