![](https://cdn-avatars.huggingface.co/v1/production/uploads/1669037788328-637b68dae8de1ebc2724e480.png)
Polish Question Answering
Collection of models and datasets for Polish Question Answering.
Sentence Similarity • Updated • 2.41k • 10Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/silver-retriever-base-v1
Sentence Similarity • Updated • 326 • 10Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/polqa
Updated • 202 • 10Note PolQA is the first Polish dataset for open-domain question answering. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7 million candidate passages. The dataset can be used to train both a passage retriever and an abstractive reader.
ipipan/maupqa
Updated • 284 • 5Note MAUPQA is a collection of 14 datasets for Polish document retrieval. Most of the datasets are either machine-generated or machine-translated from English. Across all datasets, it consists of over 1M questions, 1M positive, and 7M hard-negative question-passage pairs.
clarin-pl/poquad
Viewer • Updated • 52k • 432 • 4Note PoQuAD is a Polish equivalent of the SQuAD. It consists of more than 70,000 question-passage pairs, as well as extractive and abstractive answers.
allegro/polish-question-passage-pairs
Viewer • Updated • 10.4k • 40 • 4Note Over 10,000 manually annotated question-passage pairs. While the questions are taken from the PolQA dataset, the passages are often unique. In particular, the dataset consists mostly of hard negatives (8k pairs).
allegro/klej-dyk
Viewer • Updated • 5.18k • 197 • 1Note The "Czy wiesz?" (eng. "Did you know?") dataset consists of almost 5k question-passage pairs obtained from "Czy wiesz..." section of Polish Wikipedia. Each question is written by a Wikipedia collaborator and is answered with a link to a relevant Wikipedia article.
piotr-rybak/allegro-faq
Viewer • Updated • 1.88k • 23Note Allegro FAQ is one of the PolEval 2022 test sets. It consists of 900 frequently asked questions and 921 help articles regarding the large Polish e-commerce platform - Allegro.com. Each question-passage pair is manually checked and edited where necessary.
piotr-rybak/legal-questions
Updated • 77Note Legal Questions is one of the PolEval 2022 test sets. It consists of 718 questions and approximately 26,000 passages extracted from over 1,000 acts of law.
29Polish Information Retrieval Benchmark (PIRB)
📈View evaluation results on a leaderboard
Note The benchmark for Polish Information Retrieval, consisting of 41 datasets.
sdadas/mmlw-retrieval-roberta-base
Sentence Similarity • Updated • 779 • 1Note Neural text encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=mmlw
sdadas/gpt-exams
Viewer • Updated • 8.13k • 60 • 3Note The dataset contains 8131 multi-domain question-answer pairs. It was created semi-automatically using the gpt-3.5-turbo-0613 model available in the OpenAI API.
apohllo/plt5-base-poquad
Text2Text Generation • Updated • 483 • 1Note This is a plT5-base model trained on the PoQuAD dataset. This model was trained as a result of single experiment run, so don't expect state-of-the-art results.
sdadas/polish-reranker-large-ranknet
Text Classification • Updated • 907 • 2Note Cross-encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=reranker
amu-cai/PES-2018-2022
Viewer • Updated • 35.6k • 63 • 3Note This dataset is 297 Polish Board Certification Examinations from years 2018-2022 in a form of multiple choice questions.
OrlikB/KartonBERT-USE-base-v1
Sentence Similarity • Updated • 3.91k • 9Note This universal sentence encoder model aims to be proficient in tasks involving sentence / document similarity.
sdadas/polish-reranker-roberta-v2
Text Classification • Updated • 914 • 2Note This is an improved version of reranker based on sdadas/polish-roberta-large-v2 trained with RankNet loss on a large dataset of text pairs.
sdadas/stella-pl-retrieval
Sentence Similarity • Updated • 770 • 9Note This is a text encoder based on stella_en_1.5B_v5 and further fine-tuned for Polish information retrieval tasks.