PyTerrier Data Repository

About This Site

This site is a repository of indices for PyTerrier and Terrier.

Vaswani

Last Update 2021-07-043 index variants
The Vaswani NPL corpus is a small test collection of 11,000 abstracts has been used by the Glasgow IR group for many years (created 1990). Due to its small size, it is used for many test cases used in both Terrier and PyTerrier.
More details →

MSMARCO Document Ranking

Last Update 2021-06-134 index variants
A document ranking corpus containing 3.2 million documents. Also used by the TREC Deep Learning track.
More details →

MSMARCO Passage Ranking

Last Update 2021-06-126 index variants
A passage ranking task based on a corpus of 8.8 million passages released by Microsoft, which should be rank based on their relevance to questions. Also used by the TREC Deep Learning track.
More details →

MSMARCOv2 Document Ranking

Last Update 2021-07-041 index variants
A document ranking corpus containing 11.9 million documents. Also used by the TREC 2021 Deep Learning track.
More details →