Johannes Schmidt
The project aims to leverage topological and geometrical features in data to develop new machine learning architectures and apply them to scientific problems. Methodologically, the starting point is the Euler Characteristic Transform (ECT), which encodes the structure of simplicial complexes derived from data such as graphs and point clouds.
Two application fields are planned. First, protein solubility prediction, where protein structures are encoded via the ECT. A research direction here is to learn the probed directions and resolutions of the transform directly from data. Second, we will investigate whether encoding molecular structures from datasets such as QM9 or QMugs into images via the ECT enables the use of state-of-the-art pretrained generative image-to-image models, such as diffusion models.
Possible further research includes combining information geometry with generative modeling—for example, using the Fisher information metric and derivatives of the Kullback– Leibler divergence—to improve uncertainty estimation in generative models