Beyond effectiveness, the robustness of an information retrieval (IR) system is increasingly attracting attention. When deployed, a critical technology such as IR should not only deliver strong performance on average but also have the ability to handle a variety of exceptional situations. In recent years, research into the robustness of IR has seen significant growth, with numerous researchers offering extensive analyses and proposing myriad strategies to address robustness challenges.
In this tutorial, we first provide background information covering the basics and a taxonomy of robustness in IR. Then, we examine adversarial robustness and out-of-distribution (OOD) robustness within IR-specific contexts, extensively reviewing recent progress in methods to enhance robustness. The tutorial concludes with a discussion on the robustness of IR in the context of large language models (LLMs), highlighting ongoing challenges and promising directions for future research. This tutorial aims to generate broader attention to robustness issues in IR, facilitate an understanding of the relevant literature, and lower the barrier to entry for interested researchers and practitioners.
Time | Section | Presenter |
---|---|---|
13:30 - 13:50 | Section 1: Introduction | Maarten de Rijke |
13:50 - 14:10 | Section 2: Preliminaries | Yu-An Liu |
14:10 - 15:00 | Section 3: Adversarial robustness | Yu-An Liu |
15:00 - 15:30 | 30min coffee break | |
15:30 - 16:20 | Section 4: Out-of-distribution robustness | Yu-An Liu |
16:20 - 16:30 | Section 5: Robust IR in the age of LLMs | Yu-An Liu |
16:30 - 16:50 | Section 6: Challenges and future directions | Maarten de Rijke |
16:50 - 17:00 | Q & A | All |
A curated list of papers related to robustness in IR can be found at Awesome Robustness in Information Retrieval.
The tutorial extensively covers papers highlighted in bold.
Adversarial retrieval attack
Adversarial ranking attack
Topic-oriented adversarial retrieval/ranking attack
Surrogate model training
Pre-defined position
Output-guided position
Gradient-guided position
Word substitution
Trigger sentence
Multi-granular
Encoding error
Grammatical error
Static: greedy search
Dynamic: reinforcement learning
Data augmentation
Traditional adversarial training
Theory-guided adversarial training
Certified robustness
Perplexity-based detection
Language-based detection
Learning-based detection
Data augmentation
Domain modeling
Architectural modifications
Scaling up the model capacity
Continual learning for dense retrieval
Continual learning for generative retrieval
Self-teaching
Contrastive learning
Hybrid training
@inproceedings{liu2024robust,
author = {Liu, Yu-An and Zhang, Ruqing and Guo, Jiafeng and de Rijke, Maarten},
title = {Robust Information Retrieval},
year = {2024},
booktitle = {SIGIR},
}