Model Checkpoints in the ExPO Paper - a chujiezheng Collection

chujiezheng 's Collections

Weak-to-Strong Extrapolation Expedites Alignment

Model Checkpoints in the ExPO Paper

Model Checkpoints in the ExPO Paper

updated May 19, 2024

chujiezheng/zephyr_0.05

Text Generation • Updated Apr 28, 2024 • 6

Note zephyr-7b-sft-full trained by DPO with 5% UltraFeedback data
chujiezheng/zephyr_0.1

Text Generation • Updated Apr 28, 2024 • 58

Note zephyr-7b-sft-full trained by DPO with 10% UltraFeedback data
chujiezheng/zephyr_0.1_a8.0

Text Generation • Updated Apr 28, 2024 • 7

Note alpha = 8.0
chujiezheng/zephyr_0.2

Text Generation • Updated Apr 28, 2024 • 10

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data
chujiezheng/zephyr_0.2_a2.5

Text Generation • Updated Apr 28, 2024 • 5

Note alpha = 2.5
chujiezheng/zephyr_0.4

Text Generation • Updated Apr 28, 2024 • 4

Note zephyr-7b-sft-full trained by DPO with 40% UltraFeedback data
chujiezheng/zephyr_0.2_2lr

Text Generation • Updated Apr 25, 2024 • 5

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 learning rate
chujiezheng/zephyr_0.2_3lr

Text Generation • Updated Apr 25, 2024 • 6

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 learning rate
chujiezheng/zephyr_0.2_2ep

Text Generation • Updated Apr 25, 2024 • 5

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x2 epochs
chujiezheng/zephyr_0.2_3ep

Text Generation • Updated Apr 25, 2024 • 5

Note zephyr-7b-sft-full trained by DPO with 20% UltraFeedback data and x3 epochs