AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
View all activity
models
21
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Decision-Tree-Reward-Gemma-2-27B
Text Classification
•
Updated
•
68
•
3
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Decision-Tree-Reward-Llama-3.1-8B
Text Classification
•
Updated
•
359
•
3
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-PRM-Mistral-Data
Text Generation
•
Updated
•
796
•
8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
15.8k
•
32
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-ORM-Deepseek-Data
Text Generation
•
Updated
•
667
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3.1-8B-ORM-Mistral-Data
Text Generation
•
Updated
•
128
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter3
Text Generation
•
Updated
•
183
•
1
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter2
Text Generation
•
Updated
•
16
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/Llama3-v2-iterative-DPO-iter1
Text Generation
•
Updated
•
17
![](https://cdn-avatars.huggingface.co/v1/production/uploads/638fb8cf2380ffd99caf8c2a/xTHSf1YDQDriY5eZ7cn_1.jpeg)
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
•
1.31k
•
2
datasets
75
RLHFlow/numia_prompt_dpo_test
Viewer
•
Updated
•
1.02k
RLHFlow/numia_prompt_dpo9
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo8
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo7
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo6
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo5
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo4
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo3
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo2
Viewer
•
Updated
•
20k
RLHFlow/numia_prompt_dpo1
Viewer
•
Updated
•
20k