Yongkang Li
The robustness of modern Information Retrieval (IR) systems is of paramount importance. A robust system not only significantly enhances the user experience by providing consistent and reliable results but also contributes to reduced operational costs. This project focuses on a critical component of modern IR: systems that leverage embeddings for text representation. We investigate the robustness of these embedding-based systems from two primary perspectives: model generalization and resilience to adversarial attacks. Model generalization refers to the system's ability to maintain high performance across diverse scenarios, datasets, and tasks. Adversarial attack resilience, conversely, measures the system's capability to withstand malicious inputs specifically designed to degrade its performance. For each of these two dimensions, our research will first conduct a thorough evaluation to identify vulnerabilities and performance gaps. Following this analysis, we will explore and propose novel methods aimed at enhancing the system's robustness. The ultimate goal is to develop methodologies that lead to more reliable and trustworthy embedding-based retrieval systems.