Angel Reyero-Lobo

PhD
National Institute for Research in Digital Science and Technology (Inria)
Inferring variable importance in high-dimensional settings

With the increasing popularity of machine learning methods, there is a growing need to interpret them. On the one hand, it is essential to decipher the algorithm's black box for applications in sensitive areas such as healthcare, to understand which variables are most important for decision making, and to achieve more robust and transparent algorithms. On the other hand, one might be interested in extracting information about the data generation mechanisms using AI, which would allow more targeted research on specific variables and help domain experts by providing insights into complex, high-dimensional data.
When trying to explain AI models, there is often a trade-off between model transparency and complexity. However, to truly understand the nature of the data, it is necessary to have a measure of importance that is model agnostic, meaning that it does not depend on the specific estimation model. Therefore, the goal of this project is to develop a variable importance measure that is model independent, statistically robust, and computationally feasible. Several challenges arise in this area, such as high data dimensionality and strong correlations between variables.
In this thesis, two main directions are considered: the analysis and extension of the conditional permutation framework on the one hand, and the use of grouping in statistical inference on the other.

Track:
Academic Track
ELLIS Edge Newsletter
Join the 6,000+ people who get the monthly newsletter filled with the latest news, jobs, events and insights from the ELLIS Network.