MiniLLM (MiniLLM)

Organization Card

Community About org cards

Training Small Language Models with Knowledge Distillation

Official pre-trained models and baselines in

MiniLLM: Knowledge distillation of LLMs during instruction tuning.
MiniPLM: Knowledge distillation of LLMs during pre-training.

Collections 2

models 50

datasets 10

MiniLLM/pile-tokenized

Updated Nov 14, 2024 • 74 • 1

MiniLLM/roberta-corpus-processed

Updated Oct 22, 2024 • 62

MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5

Updated Oct 20, 2024 • 260

MiniLLM/openwebtext-processed

Updated Sep 27, 2024 • 95

MiniLLM/dolly-processed

Viewer • Updated Sep 26, 2024 • 110k • 130 • 1

MiniLLM/sinst

Viewer • Updated Sep 26, 2024 • 8.35k • 59 • 1

MiniLLM/uinst

Viewer • Updated Sep 26, 2024 • 64.8k • 75 • 1

MiniLLM/self-inst

Viewer • Updated Sep 26, 2024 • 242 • 62 • 2

MiniLLM/Vicuna

Viewer • Updated Sep 26, 2024 • 80 • 74 • 1

MiniLLM/dolly

Viewer • Updated Sep 26, 2024 • 500 • 93

MiniLLM

AI & ML interests

Recent Activity

Training Small Language Models with Knowledge Distillation

Collections 2

MiniLLM/MiniPLM-Qwen-200M

MiniLLM/MiniPLM-Qwen-500M

MiniLLM/MiniPLM-Qwen-1.2B

MiniLLM/MiniPLM-Mamba-130M

MiniLLM/MiniLLM-gpt2-120M

MiniLLM/MiniLLM-gpt2-340M

MiniLLM/MiniLLM-gpt2-760M

MiniLLM/MiniLLM-OPT-1.3B

models 50

MiniLLM/init-gpt2-120M

MiniLLM/teacher-Llama-13B

MiniLLM/MiniLLM-Llama-7B

MiniLLM/Ref-Pretrain-Qwen-104M

MiniLLM/MiniPLM-Mamba-130M

MiniLLM/MiniPLM-Qwen-1.2B

MiniLLM/MiniPLM-Qwen-500M

MiniLLM/MiniPLM-Qwen-200M

MiniLLM/MiniPLM-llama3.1-212M

MiniLLM/Pretrain-Qwen-500M

datasets 10

MiniLLM/pile-tokenized

MiniLLM/roberta-corpus-processed

MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5

MiniLLM/openwebtext-processed

MiniLLM/dolly-processed

MiniLLM/sinst

MiniLLM/uinst

MiniLLM/self-inst

MiniLLM/Vicuna

MiniLLM/dolly

AI & ML interests

Recent Activity

Team members 1

Training Small Language Models with Knowledge Distillation

Collections 2

models 50 Sort: Recently updated

datasets 10 Sort: Recently updated

models 50

datasets 10