mlfoundations-dev/dpo_from_stratos_judged_annotated_rejected_responses Text Generation • Updated 6 days ago • 297 • 1
mlfoundations-dev/dpo_from_multiple_samples_shortest_numina_aime Text Generation • Updated 6 days ago • 343