# Fusion NER Models Here you can find NER models for Fusion project! # Table of content: 1. [**NER-Models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#ner-models) 2. [**Results**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#results) 3. [**Hebrew NLP models**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#hebrew-nlp-models) 4. [**Footnotes**](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#footnotes) # NER Models: Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types. |model name | model description | model path | datasets | link to dataset | base model | entity types | trainer | |:----------|:------------------|:-----------|:--------:|:----------------|:----------:| :----------- | :-----: | | **Basic** | Basic training on IAHALT | [FusioNER/Basic_IAHALT](https://huggingface.co/FusioNER/Basic_IAHALT) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **Vitaly** | Vitaly training on IAHALT (with [BI-BI problem](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#3-bi-bi-problem)) | [FusioNER/Vitaly_NER](https://huggingface.co/FusioNER/Vitaly_NER) | IAHALT | [FusioNER/Vitaly](https://huggingface.co/datasets/FusioNER/Vitaly) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Vitaly]() | | **Name-Sentences** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) | [FusioNER/Name-Sentences](https://huggingface.co/FusioNER/Name-Sentences) | IAHALT | [FusioNER/Name_Sentences](https://huggingface.co/datasets/FusioNER/Name_Sentences) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **Entity-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Entity-Injection](https://huggingface.co/FusioNER/Entity-Injection) | IAHALT | [FusioNER/Entity_Injection](https://huggingface.co/datasets/FusioNER/Entity_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **Smart_Injection** | Training on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/Smart_Injection](https://huggingface.co/FusioNER/Smart_Injection) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **NEMO** | Basic training on NEMO dataset| [FusioNER/Nemo](https://huggingface.co/FusioNER/Nemo) | NEMO | [FusioNER/NEMO](https://huggingface.co/datasets/FusioNER/NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **IAHALT_and_NEMO** | Basic training on IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **IAHALT_and_NEMO_PP** | Training on IAHALT + NEMO + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) | [FusioNER/IAHALT_and_NEMO_and_PP](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | IAHALT + NEMO | [FusioNER/IAHALT_and_NEMO_PP](https://huggingface.co/datasets/FusioNER/IAHALT_and_NEMO_PP) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **Animals** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of animals names as PER entities) | [FusioNER/Animals](https://huggingface.co/FusioNER/Animals) | IAHALT | [FusioNER/Animals](https://huggingface.co/datasets/FusioNER/Animals) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **PRS-Injection** | Training on IAHALT + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection) (of PRS names as PER entities) | [FusioNER/PRS-Injection](https://huggingface.co/FusioNER/PRS-Injection) | IAHALT | [FusioNER/PRS_locations](https://huggingface.co/datasets/FusioNER/PRS_locations) | [HeRo](https://huggingface.co/HeNLP/HeRo) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **DICTA_Basic** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset | [FusioNER/Dicta_Small_Basic](https://huggingface.co/FusioNER/Dicta_Small_Basic) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **DICTA_Small_Smart** | Training the [DICTA](https://huggingface.co/dicta-il/dictabert) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Small_Smart](https://huggingface.co/FusioNER/Dicta_Small_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA](https://huggingface.co/dicta-il/dictabert) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **DICTA_basic_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on the [basic](https://huggingface.co/datasets/FusioNER/Basic) IAHALT dataset| [FusioNER/DICTA_basic](https://huggingface.co/FusioNER/DICTA_basic) | IAHALT | [FusioNER/Basic](https://huggingface.co/datasets/FusioNER/Basic) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **DICTA_smart_NER** | Training the [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/DICTA_Smart](https://huggingface.co/FusioNER/DICTA_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA-ner](https://huggingface.co/dicta-il/dictabert-ner) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **DICTA_Large_Smart** | Training the [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) model on IAHALT + [Name-Sentences](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#1-name-sentences) + [Entity-Injection](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#2-entity-injection)] [dataset](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [FusioNER/Dicta_Large_Smart](https://huggingface.co/FusioNER/Dicta_Large_Smart) | IAHALT | [FusioNER/Smart_Injection](https://huggingface.co/datasets/FusioNER/Smart_Injection) | [DICTA Large](https://huggingface.co/dicta-il/dictabert-large) | [classic[4]](https://huggingface.co/spaces/FusioNER/README/blob/main/README.md#4-classic) | [Etzion](https://huggingface.co/etzion) | | **TEC_NER** | Basic technology NER model | [FusioNER/tec_ner](https://huggingface.co/FusioNER/tec_ner) | TEC_NER | [FusioNER/tec_ner](https://huggingface.co/datasets/FusioNER/tec_ner) | base model | TEC | [Yehoshua](https://huggingface.co/yehoshuadiller) | # Results We test our models on the **IAHALT test set**. We also check another models, such as [DictaBert](https://huggingface.co/dicta-il/dictabert) and [HeBert](https://huggingface.co/avichr/heBERT). This is the performence results: | Model name | Precision | Recall | F1 - Score | Time (in seconds) | | :--------- | :-------: | :----: | :---------: | :---------------: | | [**IAHALT_and_NEMO_PP**](https://huggingface.co/FusioNER/IAHALT_and_NEMO_and_PP) | 0.714 | 0.353 | 0.461 | 83.128 | | [**HeBert**](https://huggingface.co/avichr/heBERT) | 0.574 | 0.474 | 0.494 | 86.483 | | [**NEMO**](https://huggingface.co/FusioNER/Nemo) | 0.553 | 0.51 | 0.525 | 81.422 | | [**IAHALT_and_NEMO**](https://huggingface.co/FusioNER/IAHALT_and_NEMO) | 0.692 | 0.678 | 0.684 | 83.702 | | [**Vitaly**](https://huggingface.co/FusioNER/Vitaly_NER) | 0.883 | 0.794 | 0.836 | 83.773 | | [**DictaBert**](https://huggingface.co/dicta-il/dictabert) | 0.916 | 0.834 | 0.872 | **70.465** | | [**DICTA_large**](https://huggingface.co/dicta-il/dictabert-large) | **0.917** | 0.845 | 0.879 | 206.251 | | [**Name-Sentences**](https://huggingface.co/FusioNER/Name-Sentences) | 0.895 | 0.865 | 0.879 | 82.674 | | [**Basic**](FusioNER/Basic_IAHALT) | 0.897 | 0.866 | 0.881 | 84.479 | | [**Smart_Injection**](https://huggingface.co/FusioNER/Smart_Injection) | 0.898 | 0.867 | 0.881 | 82.253 | | [**DICTA_Basic**](https://huggingface.co/FusioNER/Dicta_Small_Basic) | 0.903 | **0.875** | 0.888 | **69.419** | | [**DICTA_Large_Smart**](https://huggingface.co/FusioNER/Dicta_Large_Smart) | 0.904 | **0.875** | **0.889** | 204.324 | | [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) | 0.904 | **0.875** | **0.889** | **70.29** | According to the results, we recommend to use [**DICTA_Small_Smart**](https://huggingface.co/FusioNER/Dicta_Small_Smart) model. # Hebrew NLP models You can find in the table Hebrew NLP models: | Model name | Link | Creator | |:-----------|:-----|:--------| | HeNLP/HeRo | [https://huggingface.co/HeNLP/HeRo](HeNLP/HeRo) | Vitaly Shalumov and Harel Haskey | | dicta-il/dictabert | [https://huggingface.co/dicta-il/dictabert](https://huggingface.co/dicta-il/dictabert) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel | | dicta-il/dictabert-large | [https://huggingface.co/dicta-il/dictabert-large](https://huggingface.co/dicta-il/dictabert-large) | Shaltiel Shmidman and Avi Shmidman and Moshe Koppel | | avichr/heBERT | [https://huggingface.co/avichr/heBERT](https://huggingface.co/avichr/heBERT) | Avihay Chriqui and Inbal Yahav | # Footnotes #### [1] **Name-Sentences**: Adding to the corpus sentences that contain only the entity we want the network to learn. #### [2] **Entity-Injection**: Replace a tagged entity in the original corpus with a new entity. By using, this method, the model can learn new entities (not labels!) which the model not extracted before. #### [3] **BI-BI Problem**: Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another. For example, the text "הארי פוטר ורון וויזלי" would tagged as **SINGLE** entity. That problem prevent the model to extract entities correctly. #### [4] **Classic**: The classic NER types: | entity type | full name | examples | |:-----------:|:----------| --------:| | **PER** | Person | אדולף היטלר, רודולף הס, מרדכי אנילביץ | | **GPE** | Geopolitical Entity | גרמניה, פולין, ברלין, וורשה | | **LOC** | Location | מזרח אירופה, אגן הים התיכון, הגליל | | **FAC** | Facility | אוושוויץ, מגדלי התאומים, נתב"ג 2000, רחוב קפלן | | **ORG** | Organization | המפלגה הנאצית, חברת גוגל, ממשלת חוף השנהב | | **TIMEX** | Time Expression | 1945, שנת 1993, יום השואה, שנות ה-90 | | **EVE** | Event | השואה, מלחמת העולם השנייה, שלטון האפרטהייד | | **TTL** | Title | פיהרר, קיסר, מנכ"ל | | **ANG** | Language | עברית, ערבית, גרמנית | | **DUC** | Product | פייסבוק, F-16, תנובה | | **WOA** | Work of Art | דו"ח מבקר המדינה, עיתון הארץ, הארי פוטר, תיק 2000, | | **MISC** | Miscellaneous  | קורונה, התו הירוק, מדלית זהב, ביטקוין | # Datasets for English NER (for cleaning wrong entities for english texts): - [**ontonotes5**](https://huggingface.co/datasets/tner/ontonotes5) - [**conll2003**](https://huggingface.co/datasets/eriktks/conll2003) **MIT License**