Kidist Mekonnen
This PhD project investigates how to make generative information retrieval (GenIR) systems more robust, adaptive, and reproducible as document collections evolve. GenIR models retrieve documents by generating semantic identifiers directly from queries, simplifying the retrieval pipeline but introducing challenges in dynamic settings where new documents are continuously added and retraining is costly.
So far, the project has (i) introduced Direct Document Relevance Optimization for GenIR (SIGIR 2025), and (ii) developed low-resource passage retrieval benchmarks for Amharic, supporting evaluation in under-represented language (Findings of ACL 2025). The current focus is on continual adaptation in dynamic corpora, aiming to maintain retrieval performance on earlier slices while incorporating new documents without forgetting.
The project compares strategies such as parameter-efficient updates (e.g., adapters, LoRA, or partial freezing), index-aware constrained decoding (using dynamic prefix tries), and hard-negative replay that mines confuser documents across temporal partitions. Evaluation spans retention, adaptation, and efficiency (e.g., training cost, latency), and emphasizes reproducibility through versioned corpus partitions, fixed checkpoints, and released code.
The overarching goal is to develop GenIR systems that remain reliable and efficient across evolving corpora, enabling broader deployment in domains such as web archives, scientific discovery, and multilingual information access.