alnrg2arg
/

test3_sft_16bit_dpo2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

test3_sft_16bit_dpo2 / README.md

alnrg2arg's picture

Update README.md

8a741a3 verified about 1 year ago

|

history blame contribute delete

3.24 kB

	---
	language:
	- en
	license: cc-by-nc-4.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mistral
	- trl
	base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
	datasets:
	- Intel/orca_dpo_pairs
	---

	This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.

	The project is running to make a small LLM for a on-device purpose.

	Overall pipeline for this iteration is

	1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.

	This model which is not pruned is intended to compare with the pruned model.

	This is the code and parameters I chose for this model(DPO).
	```
	from transformers import TrainingArguments, AutoModelForCausalLM
	from trl import DPOTrainer

	dpo_trainer = DPOTrainer(
	model = model,

	ref_model = None,
	args = TrainingArguments(
	per_device_train_batch_size = 8,
	gradient_accumulation_steps = 8,
	warmup_ratio = 0.1,
	num_train_epochs = 3,
	learning_rate = 5e-6,
	fp16 = not torch.cuda.is_bf16_supported(),
	bf16 = torch.cuda.is_bf16_supported(),
	logging_steps = 1,
	optim = "adamw_8bit",
	weight_decay = 0.0,
	lr_scheduler_type = "linear",
	seed = 42,
	output_dir = "output_DPO",
	),
	beta = 0.1,
	train_dataset = dataset,
	# eval_dataset = raw_datasets["test"],
	tokenizer = tokenizer,
	max_length = 1024,
	max_prompt_length = 512,
	)
	```
	The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing


	Benchmark Scores

	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|-------------\|------:\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\| 1\|none \| 0\|acc \|0.6894\|± \|0.0135\|
	\| \| \|none \| 0\|acc_norm\|0.6860\|± \|0.0136\|

	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|---------\|------:\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|hellaswag\| 1\|none \| 0\|acc \|0.7092\|± \|0.0045\|
	\| \| \|none \| 0\|acc_norm\|0.8736\|± \|0.0033\|

	\| Tasks \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|truthfulqa_mc2\| 2\|none \| 0\|acc \|0.7126\|± \| 0.015\|

	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.6225\|± \|0.1292\|
	\| - humanities \|N/A \|none \| 0\|acc \|0.5745\|± \|0.1286\|
	\| - other \|N/A \|none \| 0\|acc \|0.6952\|± \|0.1095\|
	\| - social_sciences\|N/A \|none \| 0\|acc \|0.7280\|± \|0.0735\|
	\| - stem \|N/A \|none \| 0\|acc \|0.5195\|± \|0.1313\|

	\| Tasks \|Version\|Filter\|n-shot\|Metric\|Value\| \|Stderr\|
	\|----------\|------:\|------\|-----:\|------\|----:\|---\|-----:\|
	\|winogrande\| 1\|none \| 0\|acc \|0.824\|± \|0.0107\|

	\|Tasks\|Version\| Filter \|n-shot\| Metric \|Value \| \|Stderr\|
	\|-----\|------:\|----------\|-----:\|-----------\|-----:\|---\|-----:\|
	\|gsm8k\| 2\|get-answer\| 5\|exact_match\|0.7263\|± \|0.0123\|

	Average = 74.08