Alexander Panfilov

Research
Members
About
News
Events

ELLIS fosters international collaboration across domains, connecting top researchers while investing in the next generation of AI talent.

PhD & Postdoc Program Sites Programs Jobs ELLIS PhD Award Projects Building on ELLIS

Members

ELLIS Members are leading scientists in machine learning and AI, shaping Europe's global position in these fields.

Become a Member Members List Become a Fellow Fellows List

About

ELLIS is a network of excellence connecting top AI researchers across European borders to strengthen the leadership of AI made in Europe.

Board Organisation ELLIS Open Letter ELLIS Position Paper Contact For Media ELLIS FAQ

Alexander Panfilov

PhD

https://github.com/kotekjedi

A Jailbreaking Perspeclive an LLMs Safety

lf deep learning systems are inherently brittle, does this doom robust value alignment in large language models to inevitable failure through jailbreaking attacks? Or should our concern be tempered, given !hat currentjailbreaking ends may notjuslify the computational means they require? This project aims to sharpen our understanding of LLM safety in adversarial scenarios. We focus on rigorous and fair evaluation of existing attacks, assessing the true harmful potential of these models. By incorporating insights from the adversaries' perspective, we aim to identify crilical vulnerabilities in current LLMs and pave the way for more effective safety measures in the next generation of more capable models.

Track:

Academic Track

Primary Advisor