Sai Advaith Maddipatla
AlphaFold and other sequence-to-structure predictors are primarily trained on X-ray crystallography data, which are modeled from electron density maps. While crystallography inherently captures some conformational heterogeneity, AlphaFold typically collapses predictions to a single conformer, overlooking the intrinsic structural diversity of proteins and experimental measurements. This limitation is particularly acute for NMR and cryp-EM structures, where proteins are explicitly represented as ensembles.
We propose to develop experiment-grounded protein generative models that accurately reproduce the full spectrum of conformational states observed in protein ensembles. Our approach directly fine-tunes AlphaFold on high-quality experimental ensembles, producing an inductive model that captures conformational diversity directly from the amino acid sequence.
If successful, our approach could significantly advance protein design by shifting from single-structure objectives to distributional design targets. This capability would address a critical gap in existing computational pipelines, where neglecting backbone flexibility can lead to suboptimal predictions of binding affinities and enzymatic activity.