Nuren Zhaksylyk
The recent development of large-scale foundation models has introduced remarkable generalized capabilities, yet the internal mechanisms governing their reliability and adaptability remain largely unexplored. This research proposal outlines a plan to investigate the black-box nature of these systems, with a specific focus on understanding how multimodal inputs are processed and how internal representations influence output correctness. This PhD project investigates the relationship between discrete text and continuous latent representations, exploring how sensitive these models are to input perturbations and whether specific latent embeddings correspond to meaningful, retrievable concepts. By studying these internal dynamics, we aim to understand what changes within the model's state when it is steered toward correct answers versus when it hallucinates.
In parallel, this research seeks to identify architectural paradigms that allow for the efficient expansion of foundation models to new modalities without the need for complex, resource-intensive training schemes. We intend to compare different integration strategies to determine which designs best preserve pre-trained knowledge while accommodating new data types. Furthermore, we plan to explore the existence of implicit priors, encoded knowledge or biases, that exist within the model but are not explicitly triggered by standard prompting. The research will investigate methods to detect these hidden priors and explore potential techniques to "turn them on or off," thereby improving control over model behavior. This is particularly relevant for high-stakes domains such as healthcare, where aligning the model's vast internal knowledge with factual correctness is critical. Ultimately, this work aims to provide analytical insights and methodological guidelines that could help the community build foundation models that are not only more capable but also more transparent, controllable, and robust.