Angel Reyero-Lobo
With the increasing popularity of machine learning methods, there is a growing need to interpret them. On the one hand, it is essential to decipher the algorithm's black box for applications in sensitive areas such as healthcare, to understand which variables are most important for decision making, and to achieve more robust and transparent algorithms. On the other hand, one might be interested in extracting information about the data generation mechanisms using AI, which would allow more targeted research on specific variables and help domain experts by providing insights into complex, high-dimensional data.
When trying to explain AI models, there is often a trade-off between model transparency and complexity. However, to truly understand the nature of the data, it is necessary to have a measure of importance that is model agnostic, meaning that it does not depend on the specific estimation model. Therefore, the goal of this project is to develop a variable importance measure that is model independent, statistically robust, and computationally feasible. Several challenges arise in this area, such as high data dimensionality and strong correlations between variables.
In this thesis, two main directions are considered: the analysis and extension of the conditional permutation framework on the one hand, and the use of grouping in statistical inference on the other.