UltraFeedback: Boosting Language Models with High-quality Feedback Paper • 2310.01377 • Published Oct 2, 2023 • 5
Group Robust Preference Optimization in Reward-free RLHF Paper • 2405.20304 • Published May 30, 2024 • 1