QCRI
/

Translation
Safetensors
m2m_100
BaselMousi commited on
Commit
3d8e658
·
verified ·
1 Parent(s): 6425878

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -3
README.md CHANGED
@@ -1,3 +1,43 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ base_model:
4
+ - facebook/nllb-200-3.3B
5
+ pipeline_tag: translation
6
+ ---
7
+ # AraDiCE-msa-to-egy: An MSA to Egyptian Machine Translation model based on NLLB-3.3B
8
+
9
+ This repository includes an MSA to Egyptian machine translation model that was finetuned based on nllb-3.3B. The model was used to curate benchmarks for the AraDiCE paper (citation below). The The human post-edited benchmarks can be found<a href="https://huggingface.co/datasets/QCRI/AraDiCE" target="_blank" style="margin-right: 15px; margin-left: 10px">here.</a>
10
+
11
+ ## Sample Usage
12
+
13
+ ```python
14
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
15
+
16
+ tokenizer = AutoTokenizer.from_pretrained("QCRI/AraDiCE-msa-to-egy")
17
+ model = AutoModelForSeq2SeqLM.from_pretrained("QCRI/AraDiCE-msa-to-egy")
18
+
19
+ article = "من مصلحتك أن ترحل من كازابلانكا لفترة. هناك موقع لفرنسا الحرة بالقرب من برازفيل. قد أسهل لك العبور."
20
+ inputs = tokenizer(article, return_tensors="pt")
21
+
22
+ translated_tokens = model.generate(
23
+ **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("arz_Arab"), max_length=30
24
+ )
25
+ translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
26
+ print(translation)
27
+ ```
28
+ ## License
29
+
30
+ The model is distributed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)**. The full license text can be found in the accompanying `licenses_by-nc-sa_4.0_legalcode.txt` file.
31
+
32
+ ## Citation
33
+ Please find the paper<a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">here.</a>
34
+
35
+ ```
36
+ @article{mousi2024aradicebenchmarksdialectalcultural,
37
+ title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs},
38
+ author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam},
39
+ year={2024},
40
+ publisher={arXiv:2409.11404},
41
+ url={https://arxiv.org/abs/2409.11404},
42
+ }
43
+ ```