Towards Building Interpretable and Robust Deep Neural Networks
Siddhartha Gairola (Ph.D. Student)
Deep Neural Networks (DNNs) have demonstrated great success in a variety of computer vision (and language, speech, etc) tasks like image classification, object detection, semantic segmentation, action recognition, image captioning, visual question answering, and many more. However, these powerful models currently serve as black-box systems that are extremely hard to interpret. This limits their adoption into our everyday lives, especially in the case of sensitive applications like medical diagnosis, autonomous driving, security, etc. Thus, building interpretable DNNs and deriving meaningful explanations for their predictions is an important area of research. Recent works [1,2] have proposed new architectural modules that can be easily incorporated into existing architectures to make DNNs inherently interpretable. Similar in spirit, through the course of the Ph.D., we will explore novel methods in which DNNs can be made more interpretable such that the explanations are both faithful as well as visually sensible. Furthermore, it has been shown that DNNs are brittle and learn features that are prone to adversarial attacks [3,4,5] i.e., a small change in the input can result in drastically different predictions. Another key aim of the Ph.D. is to build schemes that would enable DNNs to learn powerful generalized representations that are robust and immune to such issues (such as adversarial attacks, domain shifts, etc). This would help create DNNs that are more robust, trustworthy and explainable, thus allowing for easier adoption of these powerful tools into real-world applications. References: ⠀ 1. Moritz Bohle, Mario Fritz, and Bernt Schiele. Convolutional “Dynamic Alignment Networks for Interpretable Classifications.” In CVPR, pages 10029–10038, 2021. ⠀ 2. Moritz Bohle, Mario Fritz, and Bernt Schiele. “B-cos Networks: Alignment is All We Need for Interpretability.” In CVPR, pages 10329-10338, 2022. ⠀ 3. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ICLR, 2018. ⠀ 4. Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy, 2019. ⠀ 5. Niles P. Walter, David Stutz, Bernt Schiele. “On Fragile Features and Batch Normalization in Adversarial Training”. 10.48550/ARXIV.2204.12393, 2022.
Primary Advisor: | Bernt Schiele (Max Planck Institute for Informatics & Saarland University) |
Industry Advisor: | Francesco Locatello (IST Austria) |
PhD Duration: | 15 September 2022 - 14 September 2025 |