Update README.md
Browse files
README.md
CHANGED
@@ -15,9 +15,9 @@ thumbnail: https://gsarti.com/publication/it5/featured.png
|
|
15 |
|
16 |
The [IT5](https://huggingface.co/models?search=it5) model family represents the first effort in pretraining large-scale sequence-to-sequence transformer models for the Italian language, following the approach adopted by the original [T5 model](https://github.com/google-research/text-to-text-transfer-transformer).
|
17 |
|
18 |
-
This model is released as part of the project ["IT5: Large-Scale Text-to-Text Pretraining for Italian Language Understanding and Generation"](https://
|
19 |
|
20 |
-
*The inference widget is deactivated because the model needs a task-specific seq2seq fine-tuning on a downstream task to be useful in practice. The
|
21 |
|
22 |
## Model variants
|
23 |
|
@@ -75,4 +75,13 @@ For problems or updates on this model, please contact [gabriele.sarti996@gmail.c
|
|
75 |
|
76 |
## Citation Information
|
77 |
|
78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
The [IT5](https://huggingface.co/models?search=it5) model family represents the first effort in pretraining large-scale sequence-to-sequence transformer models for the Italian language, following the approach adopted by the original [T5 model](https://github.com/google-research/text-to-text-transfer-transformer).
|
17 |
|
18 |
+
This model is released as part of the project ["IT5: Large-Scale Text-to-Text Pretraining for Italian Language Understanding and Generation"](https://arxiv.org/abs/2203.03759) (to be released), by [Gabriele Sarti](https://gsarti.com/) and [Malvina Nissim](https://malvinanissim.github.io/) with the support of [Huggingface](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104) and with TPU usage sponsored by Google's [TPU Research Cloud](https://sites.research.google/trc/). All the training was conducted on a single TPU3v8-VM machine on Google Cloud. Refer to the Tensorboard tab of the repository for an overview of the training process.
|
19 |
|
20 |
+
*The inference widget is deactivated because the model needs a task-specific seq2seq fine-tuning on a downstream task to be useful in practice. The models in the [`it5`](https://huggingface.co/it5) organization provide some examples of this model fine-tuned on various downstream task.*
|
21 |
|
22 |
## Model variants
|
23 |
|
|
|
75 |
|
76 |
## Citation Information
|
77 |
|
78 |
+
```bibtex
|
79 |
+
@article{sarti-nissim-2022-it5,
|
80 |
+
title={IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
|
81 |
+
author={Sarti, Gabriele and Nissim, Malvina},
|
82 |
+
journal={ArXiv preprint 2203.03759},
|
83 |
+
url={https://arxiv.org/abs/2203.03759},
|
84 |
+
year={2022},
|
85 |
+
month={mar}
|
86 |
+
}
|
87 |
+
```
|