PyTerrier Data Repository

About This Site

This site is a repository of indices for PyTerrier and Terrier.

Vaswani

Last Update 2021-09-174 index variants
The Vaswani NPL corpus is a small test collection of 11,000 abstracts has been used by the Glasgow IR group for many years (created 1990). Due to its small size, it is used for many test cases used in both Terrier and PyTerrier.
More details →

MSMARCO Document Ranking

Last Update 2021-06-134 index variants
A document ranking corpus containing 3.2 million documents. Also used by the TREC Deep Learning track.
More details →

MSMARCO Passage Ranking

Last Update 2021-08-066 index variants
A passage ranking task based on a corpus of 8.8 million passages released by Microsoft, which should be ranked based on their relevance to questions. Also used by the TREC Deep Learning track.
More details →

MSMARCOv2 Document Ranking

Last Update 2021-07-052 index variants
A new version of the MSMARCO document ranking corpus, containing 11.9 million documents. Also used by the TREC 2021 Deep Learning track.
More details →

MSMARCO v2 Passage Ranking

Last Update 2021-08-082 index variants
A revised corpus of 138M passages released by Microsoft in July 2021, which should be ranked based on their relevance to questions. Also used by the TREC 2021 Deep Learning track.
More details →