Valentinos Pariza
Self-supervised learning (SSL) has become a cornerstone for training general-purpose vision models without manual annotation, yet current approaches often rely on brittle training objectives and uncontrolled data biases that limit robustness and generalization. While large-scale SSL has enabled strong performance on standard benchmarks, many vision encoders still overfit to spurious correlations such as positional cues, background statistics, or dataset-specific shortcuts, leading to degraded transfer across dense, geometric, and out-of-distribution tasks.
This project aims to advance self-supervised vision learning by systematically rethinking how representations are shaped across pre-training, training, and post-training stages. Rather than introducing new supervision sources, the research focuses on principled loss design, robust data curation, and representation regularization to encourage semantically meaningful and transferable visual features. By developing training objectives that operate at multiple spatial and semantic scales, as well as post-training mechanisms to identify and suppress shortcut signals, the project seeks to improve the generalization and interpretability of vision encoders. Ultimately, this work aims to contribute scalable, open, and bias-aware self-supervised learning frameworks that support a wide range of downstream vision tasks beyond image classification.