Davide Bucciarelli
The emergence of Foundation Models has recently revolutionized the world of Deep Learning and Artificial Intelligence. These large-scale generative models enable natural human-machine interaction through language and exhibit remarkable abilities in textual generation, reasoning, and comprehension of images and text. In this context, the proposed research aims to address the development of new architectures for foundational models that are (a) inherently multimodal, capable of understanding and accepting input and providing output in the form of images, videos, and documents; (b) accurate in their responses, explainable, and trustworthy, with explicit mechanisms for evaluating the quality and reliability of the generated outputs; and (c) innovative in terms of architectural design, surpassing the standard Transformer architecture and the language-centric paradigm that has characterized multimodal model development thus far. Particular attention will be devoted to generative Al, exploring novel generative models and methods to ensure that their outputs are not only powerful and versatile but also safe, transparent, and aligned with human expectations.