What is BeerQA?
BeerQA is an open-domain question answering dataset that features questions requiring information from one or more Wikipedia documents to answer, which presents a more realistic challenge for open-domain question answering. BeerQA is constructed based on the Stanford Question Answering Dataset (SQuAD) and the HotpotQA dataset, by a team of NLP researchers at JD AI Research, Samsung Research, and Stanford University.
For more details about BeerQA, please refer to our EMNLP 2021 paper:
Getting started
BeerQA is distributed under a CC BY-SA 4.0 License. The training and development sets can be downloaded below.
A more comprehensive summary about data download, preprocessing, baseline model training, and evaluation is included in our GitHub repository, and linked below.
Once you have built your model, you can use the evaluation script we provide below to evaluate model performance by running python eval_beerqa.py <path_to_gold> <path_to_prediction>
To submit your models and evaluate them on the official test sets, please read our submission guide hosted on Codalab.
We also release the processed Wikipedia used in the process of creating BeerQA (also under a CC BY-SA 4.0 License), serving both as the corpus for the fullwiki setting in our evaluation, and hopefully as a standalone resource for future researches involving processed text on Wikipedia. Below please find the link to the documentation for this corpus.
Reference
If you use BeerQA in your research, please cite our paper with the following BibTeX entry
@inproceedings{qi2021answering, title={Answering Open-Domain Questions of Varying Reasoning Steps from Text}, author = {Qi, Peng and Lee, Haejun and Sido, Oghenetegiri "TG" and Manning, Christopher D.}, booktitle = {Empirical Methods for Natural Language Processing ({EMNLP})}, year = {2021} }
Model | Code | Ext. Res. |
SQuAD Open | HotpotQA | 3+ Hop Challenge | Macro Avg | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
EM | F1 | EM | F1 | EM | F1 | EM | F1 | ||||
1 Nov 1, 2021 |
Baseline Model: IRRR (single model) JD AI Research, Samsung Research, & Stanford University (Qi, Lee, Sido, and Manning, EMNLP 2021) |
61.06 | 67.87 | 58.12 | 69.29 | 34.15 | 40.72 | 51.11 | 59.30 |