Entity Recognition English Foundation Model by NuMind 🔥

This model provides great token embedding for the Entity Recognition task in English.

We suggest using newer version of this model: NuNER v2.0

Checkout other models by NuMind:

  • SOTA Multilingual Entity Recognition Foundation Model: link
  • SOTA Sentiment Analysis Foundation Model: English, Multilingual

About

Roberta-base fine-tuned on NuNER data.

Metrics:

Read more about evaluation protocol & datasets in our paper and blog post.

We suggest using newer version of this model: NuNER v2.0

Model k=1 k=4 k=16 k=64
RoBERTa-base 24.5 44.7 58.1 65.4
RoBERTa-base + NER-BERT pre-training 32.3 50.9 61.9 67.6
NuNER v0.1 34.3 54.6 64.0 68.7
NuNER v1.0 39.4 59.6 67.8 71.5
NuNER v2.0 43.6 61.0 68.2 72.0

Usage

Embeddings can be used out of the box or fine-tuned on specific datasets.

Get embeddings:

import torch
import transformers


model = transformers.AutoModel.from_pretrained(
    'numind/NuNER-v0.1',
    output_hidden_states=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
    'numind/NuNER-v0.1'
)

text = [
    "NuMind is an AI company based in Paris and USA.",
    "See other models from us on https://huggingface.co/numind"
]
encoded_input = tokenizer(
    text,
    return_tensors='pt',
    padding=True,
    truncation=True
)
output = model(**encoded_input)

# for better quality
emb = torch.cat(
    (output.hidden_states[-1], output.hidden_states[-7]),
    dim=2
)

# for better speed
# emb = output.hidden_states[-1]

Citation

@misc{bogdanov2024nuner,
      title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data}, 
      author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
      year={2024},
      eprint={2402.15343},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
4,080
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API has been turned off for this model.

Model tree for numind/NuNER-v0.1

Finetunes
4 models

Dataset used to train numind/NuNER-v0.1

Collection including numind/NuNER-v0.1