Publication | Closed Access
PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval
85
Citations
23
References
2021
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningIntelligent Information RetrievalDense RetrievalLearning To RankSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceComputational LinguisticsQuery ExpansionRetrieval TechniqueExpressive Python OperatorsSearch TechnologyKnowledge DiscoveryComputer ScienceSearch Engine IndexingSearch TechniquePyterrier FrameworkTransformer PipelinesInteractive Information Retrieval
PyTerrier is a Python-based retrieval framework for expressing simple and complex information retrieval (IR) pipelines in a declarative manner. While making use of the long-established Terrier IR platform for basic text indexing and retrieval, its salient utility comes from its expressive Python operators, which allow for individual IR operations to be pipelined and combined in different flexible manners as requested by the search application. Each operation applies a transformation upon a dataframe, while operators are defined with clear semantics in relational algebra. Going further, we have recently expanded the PyTerrier framework to include additional support for state-of-the-art BERT-based text re-rankers (such as EPIC) and dense retrieval implementations (such as ANCE and ColBERT). Transformer pipelines can be tuned and evaluated in a declarative manner. To increase the reusability of this framework as a resource for the IR community, PyTerrier provides easy access to a variety of standard benchmark datasets, including pre-built indices. Finally, we highlight the advantages of such a framework for information retrieval researchers and educators.
| Year | Citations | |
|---|---|---|
Page 1
Page 1