MPNet base trained on AllNLI triplets

This is a sentence-transformers model finetuned from nreimers/MiniLM-L6-H384-uncased on the gooaq dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nreimers/MiniLM-L6-H384-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (asym): Asym(
    (query-0): Dense({'in_features': 384, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
    (doc-0): Dense({'in_features': 384, 'out_features': 384, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  )
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/MiniLM-L6-H384-uncased-gooaq-asym")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1588
cosine_accuracy@3 0.2785
cosine_accuracy@5 0.3457
cosine_accuracy@10 0.4466
cosine_precision@1 0.1588
cosine_precision@3 0.0928
cosine_precision@5 0.0691
cosine_precision@10 0.0447
cosine_recall@1 0.1588
cosine_recall@3 0.2785
cosine_recall@5 0.3457
cosine_recall@10 0.4466
cosine_ndcg@10 0.2882
cosine_mrr@10 0.2393
cosine_map@100 0.2521

Training Details

Training Dataset

gooaq

  • Dataset: gooaq at b089f72
  • Size: 3,012,496 training samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type dict dict
    details
  • Samples:
    question answer
    {'query': 'what is the difference between broilers and layers?'} {'doc': 'An egg laying poultry is called egger or layer whereas broilers are reared for obtaining meat. So a layer should be able to produce more number of large sized eggs, without growing too much. On the other hand, a broiler should yield more meat and hence should be able to grow well.'}
    {'query': 'what is the difference between chronological order and spatial order?'} {'doc': 'As a writer, you should always remember that unlike chronological order and the other organizational methods for data, spatial order does not take into account the time. Spatial order is primarily focused on the location. All it does is take into account the location of objects and not the time.'}
    {'query': 'is kamagra same as viagra?'} {'doc': 'Kamagra is thought to contain the same active ingredient as Viagra, sildenafil citrate. In theory, it should work in much the same way as Viagra, taking about 45 minutes to take effect, and lasting for around 4-6 hours. However, this will vary from person to person.'}
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

gooaq

  • Dataset: gooaq at b089f72
  • Size: 3,012,496 evaluation samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type dict dict
    details
  • Samples:
    question answer
    {'query': 'how do i program my directv remote with my tv?'} {'doc': "['Press MENU on your remote.', 'Select Settings & Help > Settings > Remote Control > Program Remote.', 'Choose the device (TV, audio, DVD) you wish to program. ... ', 'Follow the on-screen prompts to complete programming.']"}
    {'query': 'are rodrigues fruit bats nocturnal?'} {'doc': 'Before its numbers were threatened by habitat destruction, storms, and hunting, some of those groups could number 500 or more members. Sunrise, sunset. Rodrigues fruit bats are most active at dawn, at dusk, and at night.'}
    {'query': 'why does your heart rate increase during exercise bbc bitesize?'} {'doc': 'During exercise there is an increase in physical activity and muscle cells respire more than they do when the body is at rest. The heart rate increases during exercise. The rate and depth of breathing increases - this makes sure that more oxygen is absorbed into the blood, and more carbon dioxide is removed from it.'}
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 24
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 24
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooaq-dev_cosine_ndcg@10
-1 -1 - - 0.0000
0.0003 1 4.9236 - -
0.0128 50 4.8759 - -
0.0256 100 4.7225 - -
0.0384 150 4.0357 - -
0.0512 200 3.0877 - -
0.0640 250 2.5094 - -
0.0768 300 2.0771 - -
0.0896 350 1.734 - -
0.1024 400 1.4959 - -
0.1152 450 1.308 - -
0.1280 500 1.1529 0.8984 0.0796
0.1408 550 1.0101 - -
0.1536 600 0.9601 - -
0.1664 650 0.8845 - -
0.1792 700 0.8348 - -
0.1920 750 0.7838 - -
0.2048 800 0.7457 - -
0.2176 850 0.6879 - -
0.2304 900 0.6778 - -
0.2432 950 0.6783 - -
0.2560 1000 0.6351 0.4814 0.2080
0.2687 1050 0.6221 - -
0.2815 1100 0.6015 - -
0.2943 1150 0.5738 - -
0.3071 1200 0.5745 - -
0.3199 1250 0.574 - -
0.3327 1300 0.5464 - -
0.3455 1350 0.5257 - -
0.3583 1400 0.5074 - -
0.3711 1450 0.4905 - -
0.3839 1500 0.4633 0.3643 0.2435
0.3967 1550 0.4853 - -
0.4095 1600 0.4587 - -
0.4223 1650 0.4561 - -
0.4351 1700 0.4442 - -
0.4479 1750 0.4399 - -
0.4607 1800 0.4448 - -
0.4735 1850 0.4159 - -
0.4863 1900 0.424 - -
0.4991 1950 0.419 - -
0.5119 2000 0.4049 0.3047 0.2713
0.5247 2050 0.3897 - -
0.5375 2100 0.3873 - -
0.5503 2150 0.3892 - -
0.5631 2200 0.3777 - -
0.5759 2250 0.382 - -
0.5887 2300 0.3703 - -
0.6015 2350 0.3703 - -
0.6143 2400 0.3809 - -
0.6271 2450 0.3576 - -
0.6399 2500 0.3486 0.2686 0.2837
0.6527 2550 0.3395 - -
0.6655 2600 0.3687 - -
0.6783 2650 0.365 - -
0.6911 2700 0.3553 - -
0.7039 2750 0.3446 - -
0.7167 2800 0.3396 - -
0.7295 2850 0.3505 - -
0.7423 2900 0.359 - -
0.7551 2950 0.3239 - -
0.7679 3000 0.3408 0.2474 0.2440
0.7807 3050 0.3217 - -
0.7934 3100 0.3367 - -
0.8062 3150 0.3479 - -
0.8190 3200 0.3278 - -
0.8318 3250 0.3203 - -
0.8446 3300 0.2966 - -
0.8574 3350 0.3298 - -
0.8702 3400 0.3291 - -
0.8830 3450 0.3199 - -
0.8958 3500 0.3302 0.2363 0.2783
0.9086 3550 0.3124 - -
0.9214 3600 0.3136 - -
0.9342 3650 0.3327 - -
0.9470 3700 0.3214 - -
0.9598 3750 0.3214 - -
0.9726 3800 0.3123 - -
0.9854 3850 0.3185 - -
0.9982 3900 0.2999 - -
-1 -1 - - 0.2882

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.057 kWh
  • Carbon Emitted: 0.022 kg of CO2
  • Hours Used: 0.212 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.20.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tomaarsen/MiniLM-L6-H384-uncased-gooaq-asym

Finetuned
(5)
this model

Dataset used to train tomaarsen/MiniLM-L6-H384-uncased-gooaq-asym

Evaluation results