Riccardo Cadei
Modern machine learning methodologies facilitate drawing insights from amounts of data that are impossible for humans to process, with increasing applications in all empirical sciences-e.g., biology, climate, physics, and medicine. Data-driven scientific discovery requires accurate predictions, and the corresponding errors can arbitrarily propagate in (causally) biased conclusions. Even the largest neural networks trained with statistical learning objectives can miss cause-effect chains, invalidating the results.
During my PhD, I aim to revisit the desiderata for 'good' representations of high-dimensional observations in terms of causal downstream tasks. First, I aim to evaluate and explain the sources of bias in prediction-powered causal inference, making explicit the role of the representation learning step and the needs and challenges in different data sources -i.e., Randomized Controlled Trials (RCT) and Observation Studies (OS). Then, I plan to introduce a principled-based methodology to generalize causal inferences on high dimensional observations from small RCT to an arbitrary target population leveraging a large pre-trained model and available OS without causal guarantees. Finally, I aim to generalize to dynamic settings and more entangled observations, contributing to real-world problems I stand for.