The SIGIR 2024 Tutorial:
Robust Information Retrieval

1CAS Key Lab of Network Data Science and Technology, ICT, CAS, University of Chinese Academy of Sciences, 2University of Amsterdam

Sunday July 14th 1:30 PM - 5:00 PM (EDT) @ South American A

About this tutorial

Beyond effectiveness, the robustness of an information retrieval (IR) system is increasingly attracting attention. When deployed, a critical technology such as IR should not only deliver strong performance on average but also have the ability to handle a variety of exceptional situations. In recent years, research into the robustness of IR has seen significant growth, with numerous researchers offering extensive analyses and proposing myriad strategies to address robustness challenges.

In this tutorial, we first provide background information covering the basics and a taxonomy of robustness in IR. Then, we examine adversarial robustness and out-of-distribution (OOD) robustness within IR-specific contexts, extensively reviewing recent progress in methods to enhance robustness. The tutorial concludes with a discussion on the robustness of IR in the context of large language models (LLMs), highlighting ongoing challenges and promising directions for future research. This tutorial aims to generate broader attention to robustness issues in IR, facilitate an understanding of the relevant literature, and lower the barrier to entry for interested researchers and practitioners.

Schedule

Time Section Presenter
13:30 - 13:50 Section 1: Introduction Maarten de Rijke
13:50 - 14:10 Section 2: Preliminaries Yu-An Liu
14:10 - 15:00 Section 3: Adversarial robustness Yu-An Liu
15:00 - 15:30 30min coffee break
15:30 - 16:20 Section 4: Out-of-distribution robustness Yu-An Liu
16:20 - 16:30 Section 5: Robust IR in the age of LLMs Yu-An Liu
16:30 - 16:50 Section 6: Challenges and future directions Maarten de Rijke
16:50 - 17:00 Q & A All

Benchmark


Perspective Papers


Reading List

A curated list of papers related to robustness in IR can be found at Awesome Robustness in Information Retrieval.

The tutorial extensively covers papers highlighted in bold.


Section 3: Adversarial robustness

3.1 Adversarial attacks

3.1.0 Classification of adversarial attack tasks

Adversarial retrieval attack


Adversarial ranking attack


Topic-oriented adversarial retrieval/ranking attack


3.1.1 Steal knowledge from black-box models

Surrogate model training


3.1.2 Identify vulnerable positions in documents

Pre-defined position


Output-guided position


Gradient-guided position


3.1.3 Add Perturbation to identified positions
3.1.3.1 Perturbation type

Word substitution


Trigger sentence


Multi-granular


Encoding error


Grammatical error


3.1.3.2 Perturb strategy

Static: greedy search


Dynamic: reinforcement learning


3.2 Adversarial defenses

3.2.1 Empirical defense

Data augmentation


Traditional adversarial training


Theory-guided adversarial training


3.2.2 Certified defense

Certified robustness


3.2.3 Attack detection

Perplexity-based detection


Language-based detection


Learning-based detection


Section 4: Out-of-distribution robustness

4.1 OOD generalizability on unforeseen documents

4.1.1 Adaptation to new corpus

Data augmentation


Domain modeling


Architectural modifications


Scaling up the model capacity


4.1.2 Updates to a corpus

Continual learning for dense retrieval


Continual learning for generative retrieval


4.2 OOD generalizability on unforeseen queries

4.2.1 Query variation

Self-teaching


Contrastive learning


Hybrid training


4.2.2 Unseen query type

BibTeX

@inproceedings{liu2024robust,
author = {Liu, Yu-An and Zhang, Ruqing and Guo, Jiafeng and de Rijke, Maarten},
title = {Robust Information Retrieval},
year = {2024},
booktitle = {SIGIR},
}