A curated collection of machine translation datasets
Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a dataset
7 days ago
pietrolesci/minipile
updated
a model
8 days ago
pietrolesci/raw_tokenisers
updated
a model
8 days ago
pietrolesci/tokenisers
Organizations
spaces
1
models
14
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/raw_tokenisers
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/tokenisers
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/SmolLM-34M-tok32000
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/bert-civilcomments-gradtracking
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/roberta-base_mnli_b9799b8f9b
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/bert-base-uncased_mnli_53fb0761e0
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/bert-tiny_mnli_cdc7ea0d50
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/pythia-14m_2024-01-17T00-07-52
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/gpt2_wikitext-103-raw-v1_L2-H4-E256-C256
Updated
![](https://cdn-avatars.huggingface.co/v1/production/uploads/61927a5329a3ab51aa2417c5/0BL0-SdpodzNfcn4Ci4VB.png)
pietrolesci/gpt2_wikitext-2-raw-v1_L2-H4-E256-C256
Updated
datasets
54
pietrolesci/minipile
Viewer
•
Updated
•
6.06M
•
147
pietrolesci/opus-5langs-1M
Viewer
•
Updated
•
5M
•
77
pietrolesci/opus-raw
Viewer
•
Updated
•
4.06B
•
262
pietrolesci/finewebedu-20BT
Updated
•
207
pietrolesci/fineweb-edu-10BT
Updated
•
164
pietrolesci/_minipile
Viewer
•
Updated
•
2.51M
•
100
pietrolesci/pythia-pile-stats
Viewer
•
Updated
•
113M
•
42
pietrolesci/slim-pajama-eval
Viewer
•
Updated
•
1.84M
•
53
•
1
pietrolesci/pile-subset
Updated
•
26
pietrolesci/pile_preshuffled_seeds
Updated
•
22
•
1