Yihua Zhang's picture

1 1 3

Yihua Zhang

NormalUhr

·

AI & ML interests

None yet

Recent Activity

published an article 4 days ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

published an article 7 days ago

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

published an article 7 days ago

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

View all activity

Organizations

Articles 4

Article

17

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Article

2

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

View all Articles

Papers 1

arxiv:2402.11846

models

None public yet

datasets

None public yet